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Introduction 

This  study  combines  complementary  computational  modeling  and  SAXS  expertise  (S.Y.)  with  expertise 
on  SERM-mediated  ER  LBD  structure  and  signaling  (G.G.).  It  is  anticipated  that  our  studies  on  the  DBD- 
hinge-LDB  assembly  in  ERa  and  ERp  will  determine  the  extent  to  which  inter-domain  interfaces  control 
ligand-dependent  transcriptional  activity  in  response  to  various  known  agonists,  antagonists,  or  novel 
SERMs,  providing  structural  insights  that  can  be  exploited  for  novel  targeting  of  ER-positive  breast 
cancers. 

For  clarity  and  completeness,  I  will  first  list  the  specific  aims  of  this  DoD  grant  and  the  summary  of  the 
studies  and  results  within  the  first  12  months,  followed  by  the  accomplishment  results  within  the  second 
12  months. 

Body 

A.  Specific  Aims 

Aim  1.  Determine  structural  models  of  assembly  states  of  the  DBD-hinge-LBD  complexes  in  ERa  and 
ERp  using  computational  modeling. 

la.  Develop  an  all-residue  coarse-grained  (ARCG)  model  for  molecular  dynamics  simulations.  Test  the 
ARCG  model  on  known  ER  LBD  homodimers1  first  and  then  apply  it  to  ERa  and  ERp  DBD-hinge-LBD 
complexes. 

lb.  Design  a  hierarchical  structural  clustering  algorithm  and  apply  it  to  structural  modes  generated  from 
ARCG  simulations  in  Aim  la.  Determine  a  small  number  (~10)  of  assembly  states  using  a  two-step 
procedure  that  takes  into  account  both  differences  in  detailed  residue-residue  distances  and  overall 
macromolecular  scattering. 

lc.  Compute  theoretical  SAXS  profiles  for  the  resulting  assembly  states  obtained  in  Aim  1b  by  using  a 
high-throughput,  scattering  computing  program  of  “Fast-SAXS”.  Compare  scattering  differences  and 
examine  theoretical  DBD-LBD  interaction  mechanisms  of  assembly  states. 

Aim  2.  Detect  ER  shape  changes  in  response  to  small  molecules  using  SAXS  experiments. 

2a.  Express  ERa  and  ERp  DBD-hinge-LBD  plasmid  constructs  in  bacteria  and  purify  them  to  a 
concentration  of  approximately  1  mg/ml.  Determine  the  sample  size  distribution  using  dynamic  light 
scattering.  Synthesize  and  prepare  various  ligands  and  different  ERE  oligomers  as  agents  to  elicit  ER 
conformational  changes. 


1 


2b.  Collect  SAXS  data  for  ERs  at  the  APS  and  NSLS.  SAXS  experiments  will  be  conducted  with  ER 
fragments  in  the  presence  of  several  ligands,  including  4-hydroxytamoxifen  (OHT),  173-estradiol  (E2), 
raloxifene,  and  various  known  ERE  oligomers.  Examine  differences  in  experimental  SAXS  patterns  and 
map  out  the  global  shape  changes  upon  the  binding  of  ligands  with  diverse  behaviors. 

Aim  3.  Determine  DBD-LBD  interactions  and  quantify  molecular  mechanisms  of  ER  multi-domain 
assembly. 

3a.  Use  BSS-SAXS  as  a  structural  technique2'3  to  determine  ER  multi-domain  assemblies.  Combine 
structural  models  of  assembly  states  obtained  from  computational  modeling  in  Aim  1  and  experimental 
SAXS  data  obtained  in  Aim  2  to  quantify  the  population  state  of  each  multi-domain  assembly  state  using 
a  Monte  Carlo  procedure.  Determine  DBD  and  LBD  interaction  mechanisms  under  various  conditions. 

3b.  Use  hydrogen/deuterium  (H/D)  exchange  analysis  to  define  specific  local  molecular  details  of  DBD- 
LDB  interfaces  and  identify  structural  features  of  hydrophobic  forces  and  hydrogen-bonding  at  DBD-LDB 
interfaces.  Resolve  the  multimeric  structural  assembly  mechanisms  in  ERa  and  ERp  in  response  to 
various  signals.  Make  specific  theoretical  predictions  of  sites  at  the  interface(s)  that  stabilize  specific 
assembly  states  for  functional  studies.  Use  site-directed  mutagenesis  to  confirm  predictions.  Make 
predictions  of  sites  at  the  interface(s)  that  can  be  cross-linked  to  stabilize  specific  assembly  states  for 
future  crystal  studies  to  capture  such  structural  snapshots  of  energetically  favorable  assembly  states. 

B.  Studies  and  Results  in  the  first  12  months 

The  results  described  here  include  progress  since  this  sub-award  was  made  in  April  2011.  We  have 
make  progress  with  the  simulations  of  ERp  DBD-hinge-LDB  (also  known  as  CDE)  in  Aim  la,  structural 
clustering  analyses  of  simulation  trajectories  in  Aim  1b,  and  SAXS  computing  in  Aim  1c.  In  addition,  we 
have  collected  SAXS  of  ERp  CDEF  in  complex  with  ERE  oligomers  and  ligands,  as  proposed  in  Aim  2. 


Figure  1.  Three  stable  conformations  have  been  identified  from  molecular  dynamics  simulations 
of  ERbeta  CDE.  Their  theoretical  SAXS  curves  are  also  computed  using  in-house  Fast-SAXS 
software  (Ref.  4). 


To  demonstrate  the  results  from  molecular  dynamics  (MD)  simulations  (Aim  1),  Figure  1  shows  three 
energetically  stable  conformations  observed  for  the  ERp  DBD-Hinge-LBD  (CDE)  fragment.  This  is 
achieved  by  100  separated  MD  simulation  trajectories,  and  each  has  10  microseconds,  resulting  in  a 
total  of  1  millisecond.  All  the  resulting  MD  configurations  are  clustered  into  a  small  number  of 
conformational  clusters  using  a  two-step  procedure.  For  the  SAXS  computing,  our  in-house  Fast-SAXS 
package  was  used  for  all  the  three  conformations.  The  results  in  Figure  1  show  that  each  SAXS  profile 
l(q)  is  different  from  the  others,  suggesting  experimental  SAXS  data  can  be  a  very  powerful  tool  for 
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determining  their  population  fractions.  A  previously 
published  SAXS-based  shape  reconstruction  method  will 
be  used  once  the  SAXS  data  of  this  CDE  complex  (Ref.  150  -  - 
2). 
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For  Aim  2,  the  expression  and  purification  the  CDE  and 
CDEF  domains  of  ER|3  have  now  been  largely  optimized 
and  we  are  able  to  consistently  purify  up  to  10  mg  of 
protein  at  a  concentration  of  2  to  3  mg/ml  (Figure  2). 

Higher  concentrations  have  been  more  difficult  to  obtain 
due  to  protein  precipitation.  However,  these 
concentrations  are  sufficient  for  SAXS  analyses.  We  have 
also  found  conditions  under  which  the  purification  can  be 
performed  without  ligand.  Therefore  the  same  protein 
preparation  can  be  used  for  SAXS  measurement  with  or 
without  the  addition  or  different  ligand  such  as  estradiol 
and  4-hydroxytamoxifen  (OHT)  as  well  as  ERE  oligomers 
of  different  lengths  and  sequence,  thus  allowing  direct 
comparison  between  the  same  SAXS  samples.  Initial 
experiments  were  carried  out  with  the  CDEF  fragment  of 
ERp,  which  contains  an  additional  25  amino  acids  that  are 
thought  to  be  at  least  partially  unstructured.  It  will  be 
important  to  compare  these  two  ERp  fragments  by  SAXS  because  no  existing  structural  data  exists  for 
the  F  domain  of  either  ER  subtype. 


Figure  2.  ERp  CDEF  protein  is 
visualized  by  Coomassie  blue 
staining  on  an  SDS-PAGE  gel  after 
purification 


For  AIM  2b,  purified  ERp  CDEF  fragment  has  been 
analyzed  by  SAXS  in  the  presence  and  absence  of  a 
known  oligomeric  ER  DNA  binding  site  (ERE). 
Figure  3  shows  the  SAXS  data  for  this  polypeptide 
in  the  presence  of  ERE,  demonstrating  that  we  have 
been  able  to  obtain  high-quality  SAXS  data  for  this 
functional  complex.  Shown  is  the  plot  of  l(q)  vs  q  (in 
a  logarithmic  scale),  where  q  is  the  amplitude  of  X- 
ray  momentum  transfer  during  the  scattering.  A 
triplicate  of  20  uL  of  sample  was  continuously  flowed 
through  a  1  mm  diameter  capillary  and  each  was 
exposed  for  30  sec.  The  data  were  recorded  under 
the  condition  of  the  X-ray  energy  of  14  keV  and  the 
temperature  of  10  degree.  This  result  lays  the 
foundation  for  further  studies  of  the  receptor  in  the 
presence  of  different  ligands  and  different  specific 
DNA  sequences  that  are  associated  with  naturally 
occurring  ER  target  genes. 


Key  Research  Accomplishments 


Figure  3.  Experimental  SAXS  of  ERbeta 
CDEF  in  the  presence  of  DNA.  Data  were 
collected  at  the  NSLS  in  Oct.  2011. 


•  Successful  modeling  of  ERp  CDE  bound  to  DNA 

•  Successful  purification  of  ERp  CDEF 

•  Successful  collection  of  SAXS  data  of  apo  ERp  CDEF  in  the  presence  of  DNA  at  the  NSLS 
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C.  Studies  and  Results  in  the  second  12  months 


The  results  described  here  include  progress  since  April  2012.  First,  we  have  developed  a  theoretical 
simulation  pipeline  to  study  the  general  protein-protein  association  as  proposed  in  Aim  la.  This  work  has 
been  published  (5)  and  highlighted  on  the  cover  of  Biophysical  Journal  (see  Figure  4).  Specifically,  a 
coarse-grained  (CG)  model  was  implemented  and  its  applications  were  demonstrated  via  molecular 
dynamics  simulations  for  several  protein  complexes.  Second,  an  enhanced  search  method  was  utilized 
to  efficiently  sample  a  broad  range  of  protein  conformations.  Third,  multiple  conformations  were  identified 
and  clustered  from  simulation  data  and  further  projected  on  a  three-dimensional  globe  specifying  protein 
orientations  and  interacting  energies.  Results  from  several  complexes  indicate  that  the  crystal-like 
conformation  is  favorable  on  the  energy 
landscape  even  if  the  landscape  is  relatively 
rugged  with  meta-stable  conformations.  A 
closer  examination  on  molecular  forces 
shows  that  the  formation  of  associated 
protein  complexes  can  be  primarily 
electrostatics-driven,  hydrophobics-driven, 
or  a  combination  of  both  in  stabilizing 
specific  binding  interfaces.  Taken  together, 
these  results  suggest  that  the  CG 
simulations  and  analyses  provide  a  new 
tool-set  to  study  protein-protein  association 
occurring  in  functional  biomolecular 
complexes.  The  energy  globe  analysis 
(Figure  4)  is  capable  of  identifying  distinct  putative  conformations  that  can  serve  as  a  basis  set  to 
analyze  SAXS  data.  The  application  of  this  PPR-CGMD  simulation  pipeline  has  been  tested  on  five 
different  protein-protein  complexes  before  it  is  applied  to  ERa  simulations. 


Biophysical 

Journal 


Figure  4.  Illustrative  energy  landscape 
on  a  globe  based  on  coarse-grained 
simulations  The  globe  provides  an  overview 
of  the  landscape  by  specifying  the  relative 
orientations  and  interaction  energies  of  the 
two  proteins:  one  is  at  the  center  of  the  globe 
(in  yellow)  and  the  other  is  on  the  surface.  The 
globe  is  colored  and  plotted  with  contour  lines 
based  on  the  interaction  energies  of  a  large 
set  of  simulated  orientations.  Three  distinct 
orientations  are  illustrated  for  the  second 
protein  in  red,  green,  and  orange, 
respectively,  each  located  in  an  energy  basin 
on  the  surface.  This  energy  globe  is  intuitive  in 
outlining  energetically  favorable  complex¬ 
forming  conformations  from  a  landscape 
perspective.  Taken  from  the  August.  2012 
issue  of  the  Biophysical  Journal  cover. 


Second,  we  have  performed  extensive  CG  simulations  on 
ERa  DBD-hinge-LDB  (also  known  as  CDE;  see  Figure  5) 
in  Aim  la.  Based  on  the  simulations  data,  a  structure 
clustering  analysis  is  performed  on  all  the  "free"  simulation 
trajectories  in  Aim  1b,  where  molecular  dynamics  (MD) 
simulations  are  implemented  and  enhanced  sampling  is 
achieved  via  the  push-pull-release  (PPR)  strategy  in  the 
search  for  different  LBD-DBD  orientations.  Specifically, 
this  search  method  is  designed  to  facilitate  domain-domain 
dissociation  and  re-association,  which  may  be  difficult  for 
brute-force  simulations  once  associated.  This  is  achieved 
via  a  harmonic  potential  is  imposed  on  the  center-of-mass 
distance  between  LBDs  and  DBDs  to  pull  them  apart  when 

they  are  close  and  to  push  them  together  when  far  away.  Then,  this  bias  is  removed  so  LBDs  and  DBDs 
are  free  to  interact  when  they  are  close  enough.  This  strategy  helps  the  simulations  to  avoid  the  trapping 
due  to  local  stable  complex-forming  conformations,  thus  enabling  the  sampling  of  different  ER 
conformations. 


Figure  5.  Schematic  diagram  of  ERa 
functional  domains  of  a  ligand- 
binding  domain  (LBD)  and  a  DNA- 
binding  domain  (DBD).  LBD  and  DBD 
dimers  are  displayed  in  green  and  blue, 
respectively,  and  their  connecting  hinge 
loops  are  marked  by  gray  dots.  The  DBD- 
specific  DNA  duplex  is  shown  as  ribbon  in 
white.  The  activation  helix  H12  Is  highlighted 
in  red.  Taken  from  the  Huang  #f  ol,  2013. 


Key  results  from  the  simulations  are 
the  identification  of  multiple 
energetically  stable  ERa  conformations 
on  the  landscape  (see  Figure  6).  An 
important  finding  is  that  estradiol- 
bound  LBDs  utilize  the  well-described 
activation  helix  HI 2  to  pack  and 

Stabilize  LBD-DBD  interactions.  Our  Figure  6.  Possible  ERa  structure  models  identified  from  the  simulations, 
results  suggest  that  the  estradiol-  Taken  from  the  Huang  *  oi,  2013. 

bound  LBDs  can  serve  as  a  scaffold  to  position  and  stabilize  the  DBD-DNA  complex,  consistent  with 
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experimental  observations  of  enhanced  DNA  binding  with  the  LBD.  Final  assessment  using  atomic-level 
simulations  shows  that  these  CG-predicted  models  are  significantly  stable  within  a  15-ns  simulation 
window  and  specific  pairs  of  lysine  residues  in  close  proximity  at  the  domain  interfaces.  Together,  these 
simulation  results  provide  a  molecular  view  of  the  role  of  ERa  domain  interactions  in  response  to 
hormone  binding.  Reports  on  detailed  findings  have  been  published  recently  (see  Ref.  6). 

As  the  proposed  models  have  very  distinct  global  shapes,  we  propose  later  to  use  a  synchrotron-based 
technique,  small-angle  solution  X-ray  scattering  (SAXS),  to  distinguish  the  correct  model.  For  a 
comparison  with  experimental  data,  we  will  compute  SAXS  profiles  for  these  structure  models,  using  the 
new  SAXS  computing  algorithm  that  the  co-PI 
has  recently  developed,  termed  Fast-SAXS-pro 
(see  Figure  7  and  Ref.  7).  This  method  is  a 
natural  extension  based  on  our  previously 
developed  methods  for  protein  and  RNA 
scattering  (3,4).  These  calculated  SAXS 
profiles  will  be  used  for  the  matching  with 
measured  experimental  data  to  characterize  the 
best-fit  model.  This  new  SAXS  computing 
algorithm  has  been  applied  to  predicted  ERa 
conformations  as  proposed  in  Aim  1c,  which 
will  enable  the  data  inteperation  of  measured 
experimental  SAXS  data. 

Finally,  as  proposed  in  Aim  2,  we  have  been  improving  the  sample  quality  of  the  ERp  CDEF  in  complex 
with  ERE  oligomers  and  ligands.  Since  April  2012,  the  protein  purity  has  been  significantly  improved,  in 
which  no  sign  of  protein  aggregation  occurs.  This  result  suggests  that  the  ERp  CDEF  can  form  a  stable 
complex  with  estradiol  and  the  ERE  oligomers.  One  issue  still  to  be  resolved  is  that  we  sometimes 
observe  the  presence  of  two  protein  bands  on  SDS-PAGE  gels  (data  not  shown).  This  observation 
suggests  that  ERp  CDEF  may  be  prone  to  protease  digestion  during  or  preceding  purification,  or  during 
handling  after  purification.  Because  this  is  a  sporadic  problem,  future  efforts  will  include  the  use  of  a 
chromatography-coupled  SAXS  data  setup,  which  will  ensure  the  sample  homogeneity  critical  for  SAXS 
data  collection. 

Reportable  Outcomes 

Two  papers  have  resulted  from  this  funded  work.  One  (5)  has  been  published  and  the  other  (6)  has  been 
accepted  for  publication.  Both  are  included  in  the  appendix. 

Conclusion 

An  important  goal  of  this  study  is  to  generate  homogenous  ER  DBD-hinge-LDB  (also  known  as  CDE) 
protein  samples  needed  for  SAXS  studies.  Up  until  now,  we  have  collected  SAXS  data  of  CDEF 
samples,  but  not  that  of  CDE.  We  will  continue  to  generate  and  compare  the  ERp  CDE  and  CDEF 
fragments  for  small  angle  x-ray  scattering  (SAXS)  analyses  of  ER  multidomain  fragments  ±  EREs  and 
various  peptides  to  obtain  solution  structure  information.  Having  worked  out  the  conditions  for  the 
expression  and  purification  of  ERp  CDEF,  we  do  not  expect  difficulties  in  the  expression  and  purification 
of  ERp  CDE  domain.  Future  work  will  also  be  focused  on  optimizing  the  purification  of  ERa  CDE.  ERa 
protein  purification  has  been  more  challenging  because  of  the  longer  hinge  region  that  separate  the  DBD 
and  LBD  that  is  more  prone  to  proteases  digestion. 


Figure  7.  Schematic  for  a 
new  SAXS  computing 
algorithm  for  protein-DNA 
complexes.  Modified  from 
Ravikumar  et  al.,  (2013). 
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Abstract 


Estrogen  receptor  alpha  (ERa)  is  a  hormone-responsive  transcription  factor  that  contains  sev¬ 
eral  discrete  functional  domains  including  a  ligand-binding  domain  (LBD)  and  a  DNA-binding 
domain  (DBD).  Despite  a  wealth  of  knowledge  about  the  behaviors  of  individual  domains,  the 
molecular  mechanisms  of  cross-talk  between  LBD  and  DBD  during  signal  transduction  from 
hormone  to  DNA-binding  of  ERa  remain  elusive.  Here,  we  apply  a  multi-scale  approach  com¬ 
bining  coarse-grained  (CG)  and  atomistically-detailed  simulations  to  characterize  this  cross¬ 
talk  mechanism  via  an  investigation  of  the  ERa  conformational  landscape.  First,  a  CG  model 
of  ERa  is  built  based  on  crystal  structures  of  individual  LBDs  and  DBDs,  with  more  empha¬ 
sis  on  their  inter-domain  interactions.  Second,  molecular  dynamics  (MD)  simulations  are  im¬ 
plemented  and  enhanced  sampling  is  achieved  via  the  push-pull-release  (PPR)  strategy  in  the 
search  for  different  LBD-DBD  orientations.  Third,  multiple  energetically  stable  ERa  confor¬ 
mations  are  identified  on  the  landscape.  A  key  finding  is  that  estradiol-bound  LBDs  utilize  the 
well-described  activation  helix  H12  to  pack  and  stabilize  LBD-DBD  interactions.  Our  results 
suggest  that  the  estradiol-bound  LBDs  can  serve  as  a  scaffold  to  position  and  stabilize  the  DBD- 
DNA  complex,  consistent  with  experimental  observations  of  enhanced  DNA  binding  with  the 
LBD.  Final  assessment  using  atomic-level  simulations  shows  that  these  CG-predicted  models 
are  significantly  stable  within  a  15 -ns  simulation  window  and  specific  pairs  of  lysine  residues 
in  close  proximity  at  the  domain  interfaces  could  serve  as  candidate  sites  for  chemical  cross- 
linking  studies.  Together,  these  simulation  results  provide  a  molecular  view  of  the  role  of  ERa 
domain  interactions  in  response  to  hormone  binding. 


Introduction 


Estrogen  receptor  alpha  (ERa),  a  member  of  the  nuclear  hormone  receptor  (NHR)  family,  plays 
a  key  role  in  eukaryotic  transcriptional  regulation.1, 2,3,4  Once  activated  by  hormones  such  as 
estradiol,  ERa  can  bind  specific  DNA  sequences  as  a  homodimer  and  modulate  the  expression 
of  genes  involved  in  the  development,  homeostasis  and  metabolism.5, 6’ 7?  8’ 9  Because  ERa  is  a 
key  target  for  therapeutic  intervention10,  n* 12,  it  is  vital  to  obtain  a  better  molecular  understand¬ 
ing  of  ERa  action  that  results  from  hormone  signaling. 

ERa  is  a  multi-domain  protein  fold  containing  a  DNA-binding  domain  (DBD)  and  a  ligand 
binding  domain  (LBD)2, 13 .  Their  individual  crystal  structures  have  been  determined  (as  illus¬ 
trated  in  Fig.  1).  These  two  domains  are  connected  by  a  50-residue  flexible  hinge  region  that 
allows  these  domains  to  move  freely  in  three  dimensions.  It  has  been  observed  that  the  DBD 
alone  can  bind  a  double-strand  DNA14, 15, 16,  and  that  this  DNA  binding  of  the  receptor  is  tighter 
in  the  presence  of  LBD  when  bound  to  estradiol6.  At  a  structural  level,  this  estradiol  binding  is 
also  coupled  with  a  major  rearrangement  of  the  critical  activation  helix  H12  at  the  C-terminus, 
which  is  positioned  on  the  LBD  surface  and  covers  the  estradiol  binding  pocket17, 18, 19, 20, 21 . 

The  mechanisms  by  which  ERa  DBDs  and  LBDs  interact  with  each  other  when  ERa  is 
bound  to  DNA  are  still  unclear.  For  several  other  members  of  the  NHR  family,  it  has  been 
demonstrated  that  the  LBD-DBD  interface  plays  an  allosteric  role  to  mediate  the  signal  trans¬ 
duction  between  ligand  binding  and  DNA  association22,23,24.  For  example,  a  single  point  mu¬ 
tation  at  the  interface  of  the  PPAR/RXR  complex  affects  its  DNA  binding  properties  to  a  PPAR 
response  element22.  More  recently,  mutagenesis  studies  in  the  androgen  receptor  (AR)  show 
that  several  mutations  in  the  AR  LBD  can  decrease  DNA  binding  affinity,  and  conversely,  mu¬ 
tations  in  the  DBD  also  affect  ligand  binding24.  These  observations  strongly  suggest  the  internal 
communication  between  LBDs  and  DBDs  is  required  to  facilitate  in  vitro  DNA  binding.  For 
ERa,  how  these  domains  interact  and  communicate  remains  largely  unknown,  in  part  due  to 
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the  lack  of  structural  information  of  a  full  LBD-DBD/DNA  complex  and  the  lack  of  knowledge 
about  the  LBD-DBD  interactions  at  a  molecular  level. 

In  this  study,  we  used  a  computational  approach  of  coarse-grained  and  subsequent  all-atom 
molecular  dynamics  (MD)  simulations  to  explore  the  physical  interactions  between  the  ERa 
DBD  and  LBD  domains  from  an  energy  landscape  perspective.  Specifically,  a  coarse-grained 
(CG)  model  and  an  efficient  sampling  method  were  implemented  to  simulate  the  ERa  confor¬ 
mational  landscape.  The  projection  of  simulation  data  onto  a  three-dimensional  globe  allows 
one  to  picture  the  energy  landscape  of  ERa  multi-domain  interactions.  Multiple  ERa  confor¬ 
mational  states  were  identified  from  the  landscape,  highlighting  different  modes  of  LBD-DBD 
interactions.  Final  structural  refinement  via  all-atom  simulations  starting  from  CG-predicted 
models  provides  detailed  structural  features  at  the  domain  interfaces  that  could  be  experimen¬ 
tally  tested.  Overall,  these  results  provide  new  speculation  into  ERa  domain  cross-talk  mecha¬ 
nisms  responsible  for  the  allosteric  control  of  receptor  binding  to  DNA. 

Methods 

Details  of  the  CG  model 

A  coarse-grained  (CG)  model  was  implemented  where  each  amino  acid  of  ERa  is  represented 
by  a  single  bead  positioned  at  its  Ca  atom  and  each  nucleotide  of  DNA  is  represented  by  its 
05/  atom.  The  CG  energy  function  for  ERa  (E181-P552)  is  defined  for  each  component  as  fol¬ 
lows.  The  energy  functions  for  the  estradiol-bound  LBD  dimer  (S309-P552)  and  the  DBD  dimer 
(E181-K252,  in  complex  with  a  17-bp  double-stranded  DNA)  were  built  based  on  their  crystal 
structures  (PDB  entries  1QKU  and  1HCQ)15,20,  using  the  widely  used  G5-type  potential.  The 
hinge  region  connecting  DBD  and  LBD  (G253-L308)  was  built  using  loop  modeling  method 
in  MODELLER  25 .  Bonded  interactions,  including  bond,  angle  and  dihedral  terms,  were  mod¬ 
eled  using  typical  energy  functions  as  done  previously  (see  details  in  Refs.26, 27).  Non-bonded 
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attractive  interactions  between  native  contact  pairs  between  LBDs  and  DBDs  were  modeled  us¬ 
ing  Lennard-Jones  (LJ)  interactions,  £lj  =  Lijeo[5(o?  /r;7)12  —  6(a" /r//) 10] .  where  /■;/  is  the 
residue-residue  distance,  and  o?-  is  the  corresponding  distance  in  the  reference  structure.  The 
contact  pairs  within  each  domain  were  defined  using  the  CSU  software28.  Native-like  DBD-DNA 
interactions  were  modeled  using  e0  =  5  kcal/mol  whenever  residue-nucleotide  pairs  were  within 
a  distance  of  15  A. 

The  interactions  between  LBDs  and  DBDs  are  specified  by  the  energy  function  L’lbd-dbd, 
which  includes  electrostatic  (Eeiec)  and  hydrophobic  (£h)  components, 

LlBD-DBD  =  Lelec  +  EH.  ( 1 ) 

Here,  Es iec  =  j  q,q}  / f  47i£,,/Ltrri/ )  was  used  where  q\  is  the  charge  of  residue  i  and  e0  is  the 
vacuum  electric  permittivity.  An  effective  dielectric  coefficient  Deff  =  Dsexp  (r,j/q)  is  applied  to 
reflect  the  shielding  effect  between  two  residues  separated  by  a  distance  of  /•;/ .  Ds  =  10  was  used 
to  describe  the  local  dielectric  environment  when  two  domains  are  forming  a  close  interface  and 
%  -  8.2  A  to  mimic  the  screening  effect  at  a  150-mM  salt  concentration.  At  pH  7,  residue  charges 
qi  -  +e  for  Lys  and  Arg,  -e  for  Asp  and  Glu,  and  +0.5<?  for  His  (e  is  the  elementary  charge)  were 
used26,29.  Hydrophobic  interactions  ( E\ \ )  are  either  attractive  (LJ-type)  (£,/  <  0)  or  purely 
repulsive  (£,;  >  0)  where  ey  =  +  (3)  between  residues  i,  j  and  is  the  Miyazawa- 

Jernigan  (MJ)  statistical  energy.30  The  values  of  a  =  0.4  and  P  =  1.3  were  used  to  balance 
attractive  and  repulsive  interactions.26  We  used  =  £(/-|  [5(g(// r(/-) 12  —  6(g,7/ r,/) l0]  if 

Eij  <  0  ,  and  En(i,j)  =  eIy[5(a,y/r,7-)12(l  -  exp(-(rij  -  Gtj)/d )2)]  if  £,y  >  0,  where  d  =  3.8  A. 
The  distance  G,y  is  defined  by 

<*ij=y(n  +  rj)  (2) 

where  the  value  of  y  =  0.625  was  chosen26  and  is  the  van  der  Waals  radius  of  residue  i29.  The 
non-native  interactions  of  DNA  with  LBDs  and  the  hinge  region  were  mostly  repulsive  with  £(/ 
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=  1  kcal/mol. 


Details  of  the  PPR  sampling  strategy 

The  CG  model  was  implemented  via  Langevin  molecular  dynamics  (MD)  simulations  in  a  mod¬ 
ified  version  of  the  CHARMM  program.31  Simulations  were  performed  at  300  K  with  a  friction 
coefficient  of  50  ps-1.32  A  simulation  time  step  of  10  fs  was  used  and  simulated  coordinates 
were  saved  every  100  ps.  A  sampling  strategy  of  push-pull-release  (PPR)  was  implemented  via 
a  biasing  potential  £ppr26, 

{0,  when  R,  <  Rc  and  rmin  <  r0, 

(3) 

k(R  —  Rt)2,  otherwise. 

where  R  is  the  instantaneous  center-of-mass  distance  between  two  selected  domains,  Rt  is  its 
corresponding  target  distance,  and  is  the  closest  residue-residue  distance  between  two  do¬ 
mains.  The  two  domains  move  between  Rmin  <Rt<  Rmax ,  where  the  values  of  Rmjn  =  0  A  and 
Rmax  =  100  A  were  used.  This  isppR  potential  is  applied  to  each  PPR  cycle  with  a  simulation 
time  of  L  and  repeated  for  N  times,  where  each  cycle  includes  three  steps:  (1)  push  the  two 
domains  away  from  each  other  when  they  are  close,  (2)  pull  the  domains  closer  until  they  are 
separated  by  a  threshold  distance  Rc,  and  (3)  release  them  to  interact  freely  by  removing  the 
biasing  potential.  For  ERa  simulations,  the  parameters  N  =  5,L  =  25ns,k=  100  kcal/(mol-A2), 
ra  =  7.6  A,  and  Rc  =  45  A  were  used.  Four  sets  of  PPR-assisted  simulations  were  performed, 
each  accounting  for  a  different  biasing  pair  between  two  LBDs  and  two  DBDs.  Within  each 
set,  a  total  of  145  simulation  runs  were  carried  out  simultaneously,  each  with  a  different  initial 
configuration  of  LBD-DBD  relative  orientation.26  A  simulation  time  of  145  ps  was  achieved  in 
total. 


5 


Details  of  orientation-based  clustering  analysis 

A  structural  clustering  analysis  is  applied  mainly  based  on  relative  orientations  between  do¬ 
mains.  First,  structural  alignment  is  performed  on  LBDs  only.  Second,  resultant  configurations 
of  DBDs  are  used  to  calculate  pair-wise  RMSD  values.  Finally,  these  RMSD  values  are  used  as 
an  input  for  a  hierarchical  clustering  analysis  as  implemented  in  MTLAB  .  This  clustering  anal¬ 
ysis  was  applied  to  selected  simulation  snapshots  if  they  have  the  lowest  £lbd-dbd  snapshot 
from  the  last  5 -ns  segment  in  each  PPR  cycle,  where  £lbd-dbd  <  0  kcal/mol,  and  the  RMSD 
values  of  individual  DBD  and  LBD  monomers  are  within  1.5  A  and  3  A  to  their  corresponding 
crystal  structures,  respectively.  In  addition,  mirror  conformations  by  exchanging  the  coordi¬ 
nates  of  two  protein  chains  were  generated  due  to  ERa  homodimeric  symmetry.  Thus,  a  total 
number  of  9780  configurations  were  used  for  the  clustering  analysis. 

Details  of  all-atom  simulations 

All-atom  MD  simulations  were  performed  starting  with  CG-predicted  conformations.  Initial 
ERa  models  were  built  based  on  crystal  structures  of  individual  domains.  First,  the  CG  models 
were  used  as  the  templates  to  build  atomically-detailed  structures,  where  crystal  structures  of 
the  LBDs  (PDB  entry  1QKU20)  and  DBDs  (PDB  entry  1HCQ15)  were  aligned  on  the  top  of  Ca 
atoms  of  CG  models.  Second,  the  restraints  of  Ca  atoms  within  domains  towards  CG  models 
were  achieved  via  targeted  MD  simulations.33  An  implicit  solvent  GB  model  was  used  to  reduce 
the  computational  cost34  and  a  total  of  40-ns  simulation  time  was  performed  to  reach  a  Ca- 
RMSD  value  of  0.5  A  (relative  to  the  CG  models  of  ERa  domains).  Finally,  standard  all-atom 
simulations  without  any  bias  were  performed  for  model  refinement  using  the  program  NAMD35, 
where  the  force  fields  of  AMBER99SB36  and  AMBER99bsc037, 38  were  used.  Parameters  for 
the  estradiol  were  generated  using  GAFF39  and  ANTECHAMBER40, 41, 42 .  All  parameters  were 
prepared  using  the  AMBER  LEaP  module43.  The  system  was  placed  in  a  rectangular  water 
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box  with  more  than  12  A  padding  and  16  charge-neutralizing  sodium  ions  around  the  solute. 
Langevin  simulations  with  a  total  of  15  ns  were  performed  for  each  state  at  a  pressure  of  1  atm 
and  a  temperature  of  300  K. 

Results  and  Discussion 

A  cartoon  representation  of  the  ERa  multi-domain  architecture  is  shown  in  Figure  1,  illustrating 
ERa  domains  for  both  ligand  and  DNA  binding  (LBD  and  DBD).  Here,  a  recently  developed 
coarse-grained  MD  simulation  pipeline  was  applied  to  simulate  ERa  dynamics.26  Specifically, 
the  search  for  different  LBD-DBD  assemblies  was  enhanced  via  a  sampling  strategy  of  push- 
pull-release  (PPR).  After  grouping  similar  configurations  into  a  conformational  state  via  a  hier¬ 
archical  clustering  scheme,  ERa  simulation  data  were  projected  onto  a  three-dimensional  globe 
specifying  the  ERa  inter-domain  orientation,  and  the  interacting  energy  of  each  state  was  used 
to  visualize  the  LBD-DBD  energy  landscape  and  identify  favorable  conformations  accessible 
on  the  landscape.  Finally,  the  structural  features  of  stable  ERa  conformational  states  were 
illustrated  to  highlight  their  inter-domain  interactions. 

CG  simulations  with  the  PPR  sampling  strategy 

The  CG  model  implemented  in  ERa  simulations  has  two  main  components:  intra-  and  inter¬ 
domain  interactions.  For  each  component  of  the  LBD-DBD  complex,  a  Go-like  model  was  used 
to  model  the  dynamics  within  both  the  LBD  dimer  and  the  DBD  dimer  by  taking  advantage  of 
the  availability  of  their  crystal  structures.15, 20  This  treatment  allowed  each  dimer  to  structurally 
fluctuate  around  its  native  conformation,  as  well  as  maintain  substantive  flexibility  as  required 
for  conformational  deformation  in  an  encounter  complex.  For  the  inter-domain  LBD-DBD 
interactions,  a  residue-specific  energy  function  of  Llbd-dbd  (Eq.  1)  was  implemented  by  ac¬ 
counting  for  explicit  electrostatic  and  hydrophobic  interactions  (see  details  in  Methods).  Thus, 
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a  CG  model  of  ERa  was  built  based  on  crystal  structures  as  well  as  physical  principles  for  the 
purpose  of  MD  simulations. 

The  search  for  ERa  conformations  was  enhanced  via  the  sampling  strategy  of  push-pull- 
release  (PPR).  This  PPR  approach  was  adopted  to  facilitate  the  dissociation  and  re-association 
between  LBDs  and  DBDs  (see  Ref.  26  and  Methods).  This  sampling  enhancement  was  achieved 
by  applying  a  biasing  potential  to  the  center-of-mass  distance  of  the  two  domains  (Eq.  3),  which 
can  push  the  domains  together  to  associate,  pull  them  away  to  disassociate,  and  release  them  to 
relax  by  removing  the  biasing  potential  when  the  domains  are  encountered.  It  should  be  noted 
that  during  the  pulling  and  pushing,  two  domains  were  allowed  to  rotate  freely,  thus  helping 
achieve  broader  sampling  in  their  relative  orientations. 

To  illustrate  the  sampling  coverage  achieved  by  PPR,  a  three-dimensional  sphere  was  used  to 
specify  LBD-DBD  orientations.  The  LBDs  were  first  used  for  the  alignment  of  each  simulation 
snapshot  and  placed  in  the  center  of  this  globe.  The  center-of-mass  of  DBDs  was  then  projected 
onto  the  surface  of  the  globe  [Fig.  2(A)].  Compared  to  the  simulations  without  using  PPR, 
Figure  2(B)  shows  that  within  a  same  simulation  window  of  125  ns,  the  sampling  efficiency 
was  considerably  enhanced  by  PPR  with  a  much  larger  surface  area  spanned  on  the  globe.  The 
sampling  of  these  PPR-assisted  simulations  was  further  accelerated  by  launching  a  set  of  145 
parallel  CGMD  runs  each  with  a  different  initial  configuration,  as  illustrated  in  Figure  SI  (A).  In 
addition,  four  sets  of  such  simulations  were  performed,  each  with  a  different  biasing  potential 
between  a  pair  of  monomeric  LBD  and  DBD  (see  Methods).  As  a  result,  a  total  of  145  ps 
simulation  time  was  achieved  and  an  accumulated  61.625-^s  from  unbiased  PPR  segments  was 
used  for  the  rest  of  the  analyses. 

Figure  2(C)  illustrates  the  sampled  area  of  the  globe  covered  by  simulation  snapshots,  where 
each  snapshot  is  represented  by  a  blue  dot.  Clearly,  most  of  the  surface  area  was  sampled  by 
PPR-assisted  simulation  trajectories,  suggesting  that  sufficient  sampling  of  various  LBD-DBD 
orientations  was  obtained.  We  note  that  both  front  and  back  views  of  the  globe  were  based 
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on  the  placement  of  LBDs  in  the  center  [Fig.  S1(B)];  an  alternative  placement  of  DBDs  in 
the  center  is  also  shown  in  Figure  S1(C),  where  more  than  half  of  the  sphere  is  well  covered 
(and  the  other  half  is  not  covered  due  to  the  blockage  of  its  DNA  binding).  Together,  these 
results  suggest  that  this  PPR  strategy  substantially  assisted  the  CGMD  simulations  with  a  rather 
comprehensive  sampling  coverage  in  the  ERa  domain  orientational  space. 

Energy  landscape  on  a  globe:  clustering  and  projection 

The  visualization  of  ERa  conformational  landscape  was  analyzed  by  projecting  simulation  data 
onto  a  three-dimensional  globe  after  clustering.26  First,  a  hierarchical  clustering  algorithm  was 
implemented  to  organize  the  large  amount  of  simulation  data,  which  was  based  on  pair-wise 
RMSDs  of  the  DBDs  after  LBDs  were  aligned  (see  Methods).  We  also  note  that  only  the  last 
5-ns  segment  within  the  release  part  of  each  PPR  cycle  was  used  for  the  analysis  because  a 
period  of  equilibration  time  before  this  segment  was  observed  to  be  helpful  for  the  relaxation  of 
the  configurations  coming  out  of  PPR  pushing  segments,  as  shown  in  Figure  S2.  As  a  result,  a 
total  of  4890  structures  were  selected  from  the  entire  CGMD  simulations,  each  representing  the 
lowest  E'lbd-dbd  configuration  from  its  corresponding  PPR  cycle.  Figure  3(A)  shows  a  plot 
of  Flbd-dbd  versus  radius  of  gyration  (R„)  for  these  configurations.  Clearly,  a  broad  range  of 
configurations  were  observed  in  the  simulations,  ranging  in  Elbd-dbd  from  0  kcal/mol  to  —54 
kcal/mol  and  Rg  from  28  A  to  39  A. 

Second,  the  energetics  of  ERa  conformational  landscape  was  achieved  by  the  projection  of 
resulting  clusters  onto  an  energy  globe.  Specifically,  this  globe  was  colored  based  on  the  lowest 
£lbd-dbd  value  within  each  cluster.  Figure  3(B)  illustrates  the  projected  energy  globe,  charac¬ 
terizing  the  ERa  overall  energy  landscape.  This  characterization  was  achieved  with  a  clustering 
analysis  procedure  with  a  total  number  of  clusters  N  =  300.  Additional  analyses  using  N  =  500 
and  N  =  1000  yielded  a  similar  overall  picture  of  the  energy-colored  globe  [Fig.  S3(A)].  Thus, 
the  globe  resulting  from  N  =  300  clusters  was  used  to  complete  the  analysis.  The  lowest  energy 
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configurations  in  the  top  25%  and  30%  of  all  unbiased  simulations  were  also  projected  onto 
the  energy  globe  [Fig.  S3(B)].  These  results  show  that  each  state  remains  at  a  similar  energy 
basin  when  different  pools  of  configurations  are  used  in  the  projection,  suggesting  that  these 
CG-identified  states  are  capable  of  representing  and  illustrating  the  simulated  landscape.  Over¬ 
all,  compared  to  traditional  two-dimensional  plots  as  shown  in  Figure  3(A),  this  energy  globe 
analysis  offers  an  advantage  for  the  navigation  of  a  complex  conformational  space  and  for  the 
identification  of  stable  ERa  conformations. 

Identification  of  multiple  ERa  conformations 

The  energy-landscape-on-a-globe  analysis  readily  revealed  that  multiple  ERa  conformational 
states  are  physically  accessible  via  CGMD  simulations.  Figure  3(B)  outlines  five  energetically 
favorable  ERa  states  (marked  by  i-v),  which  can  be  easily  identified  from  this  colored  globe. 
Clearly,  these  states  differ  not  only  in  their  LBD-DBD  interacting  energies,  but  also  in  their 
domain  interfaces.  Based  on  their  surface  views  [Fig.  3(C)]  and  structural  ensembles  (Fig.  S4), 
both  states  iv  and  v  are  in  a  more  extended  conformation  with  larger  Rg  values  than  states 
i.  //,  and  iii.  Within  these  two  states,  only  one  monomer,  either  LBD  or  DBD,  is  involved  in 
LBD-DBD  interactions,  while  both  DBDs  make  contact  with  both  LBDs  in  three  other  states. 
Notably,  such  a  large  difference  in  overall  size,  e.g.,  about  3  A  between  state  iii  and  state  iv/v, 
could  be  experimentally  characterized  by  small-angle  X-ray  scattering  (SAXS). 

Among  these  observed  states,  different  LBD-DBD  binding  interfaces  are  formed.  For  ex¬ 
ample,  state  iv  is  located  at  the  upper  region  of  the  globe,  which  has  a  distinct  LBD  binding 
surface  compared  to  the  rest  of  four  states  [Figs.  3(C)  and  S5(A)].  It  should  be  noted  that  this 
interface  is  observed  to  be  partially  involved  in  the  interactions  of  ERa  LBD  with  a  peptide  an¬ 
tagonist  [Fig.  S5(B)].44, 45  A  closer  look  at  the  LBD  surface  shows  that  a  similar  region  is  shared 
and  utilized  for  the  formation  of  close  LBD-DBD  interfaces  at  states  i,  z7,  iii ,  and  v  [Fig.  S6(A)], 
where  mostly  one  LBD  monomer  participates  in  the  interfacial  interactions.  From  the  LBD 
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side,  this  interface  mainly  consists  of  two  distant  sequence  segments:  helices  H3-H5  and  helix 
H12,  marked  in  blue  and  magenta,  respectively  [Fig.  4(A)].  A  portion  of  these  LBD  surfaces 
also  form  the  binding  site  for  the  interactions  with  a  co-activator  peptide19  [Fig.  S6(B)]  as  well 
as  for  interactions  with  4-hydroxytamoxifen  observed  in  ER(346. 

One  striking  feature  is  that  HI 2  is  crucial  in  mediating  LBD-DBD  interactions  in  several 
conformational  states.  Figure  4(B)  shows  that  H12  forms  a  large  number  of  contacts  with 
DBD  at  states  i,  ii,  iii ,  and  v,  and  most  of  them  contribute  to  the  predominant  hydrophobic  inter¬ 
domain  interactions  (Fig.  S7).  This  involvement  of  H12  was  further  tested  by  mutating  several 
charged  and  hydrophobic  residues  in  helix  H12  (Fig.  S8).  The  destabilization  of  domain  in¬ 
teractions  is  clearly  reflected  by  the  loss  of  interacting  energies,  suggesting  that  H12  plays  a 
structural  role  in  gluing  and  stabilizing  inter-domain  interactions.  Furthermore,  compared  with 
previous  crystallographic  studies  on  LBDs  alone,  in  which  this  H12  helix  is  structurally  re¬ 
orientated  upon  estradiol  binding.17, 19,20  the  H12  involved  in  domain  interface  maintains  this 
estradiol-bound  orientation  in  states  j,  ii  and  iii.  In  fact,  this  particular  HI 2  adopts  a  similar 
estradiol-bound  conformation,  in  which  its  native  contacts  with  LBD  were  sacrificed,  through¬ 
out  all  of  these  low  L'lbd-dbd  states,  suggesting  that  its  interactions  with  DBD  can  stabilize  re¬ 
ceptor  LBD-DBD  conformations.  Given  the  previous  observation  that  the  LBD,  in  the  presence 
of  hormone,  promotes  or  stabilizes  tighter  DNA  binding.6  our  results  strongly  suggest  that  H12 
plays  a  key  role  in  receptor  stabilization  associated  with  tight  DNA  binding.  This  observation  is 
also  consistent  with  the  fact  that  deletion  of  H12  can  eliminate  the  ability  of  ERa  to  bind  DNA.6 
Together,  these  findings  indicate  that  HI 2  is  a  key  structural  modulator  in  hormone-activated 
ERa  allostery;  it  mediates  internal  communication  by  bridging  LBD  and  DBD  association, 
through  which  the  re-orientation  of  HI 2,  triggered  by  hormone  binding,  results  in  tighter  DNA 
binding. 

Another  LBD  surface  involved  in  LBD-DBD  interfaces  consists  of  helices  H3,  H4  and  H5 
[labeled  in  blue  in  Lig.  4(A)].  Ligure  4(C)  shows  that  these  helices  make  a  large  number  of 
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contacts  with  DBD  in  states  z,  z'z,  z'z'z,  v,  while  almost  no  contact  occurs  in  state  iv.  More  details, 
including  residue  locations  and  total  contacts,  on  these  interfacial  sites  are  also  illustrated  in 
Figures  S5  and  S6.  Overall,  this  H3-H5  surface,  together  with  H12,  accounts  for  the  stabiliza¬ 
tion  of  dominant  LBD-DBD  conformations. 

It  should  be  noted  that  one  exception  to  this  communication  is  state  /v,  in  which  HI 2  is  not 
involved  in  the  formation  of  its  LBD-DBD  interface.  Figure  4(B)  shows  that  nearly  no  contact 
is  formed  between  H12  and  DBDs  at  state  iv.  More  strikingly,  this  state  has  a  similar  domain 
orientation  regarding  LBD  interface-forming  surfaces  as  observed  in  a  new  crystal  structure  of 
the  nuclear  receptor  homodimer  HNF4a  (PDB  entry  4IQR)47  (Fig.  S9),  where  the  top  regions  of 
LBD  surfaces,  instead  of  the  region  of  HI 2  and  H3-H5,  are  involved  in  the  interface  formation 
in  both  conformations.  While  there  is  a  difference  in  their  corresponding  DNA  segments  (an  in¬ 
verted  repeat  for  ERa  and  a  direct  repeat  for  HNF4a),  this  result  highlights  that  the  simulations 
are  capable  of  capturing  the  large-scale  domain  organizations. 

The  DBD  surfaces  involved  in  forming  interfaces  vary  from  state  to  state.  Figure  4(D) 
highlights  those  interacting  residues  on  DBD  surfaces  that  form  multiple  contacts  with  its  LBD 
partner.  Among  all  of  these  states,  the  surface  itself  mainly  consists  of  a  loop  connecting  the  first 
two  helices  as  well  as  a  large  portion  of  the  third  helix.  The  key  difference,  among  states  z,  ii,  zzz, 
and  v,  is  that  different  sets  of  DBD  residues  are  utilized  for  the  interfacial  packing,  resulting 
in  distinct  LBD-DBD  orientations.  This  large-scale  difference  of  domain  orientations  could  be 
examined  by  small-angle  scattering  measurements,  which  was  not  attempted  here  but  will  be 
reported  in  future  communications.  In  addition,  both  DBD  monomers  interact  with  LBDs  in 
states  i  —  iv  [Fig.  4(D)]. 

This  observation  that  both  DBD  monomers  can  interact  with  LBDs  suggests  a  possible  role 
of  LBD  in  receptor  stabilization.  It  has  been  suggested  that  the  interactions  between  two  DBD 
monomers  are  relatively  weak.48  Here,  based  on  the  simulation  results,  their  interactions  with 
LBDs  can  be  rather  strong  with  a  gain  of  energy,  E'lbd-dbd  >  30  kcal/mol  (Fig.  3).  This 
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stabilization  via  LBD  interactions  is  also  consistent  with  the  observation  that  the  binding  of 
receptor  to  DNA  is  tighter  in  the  presence  of  hormone-bound  LBD  compared  to  DBD  alone.6 
These  results  suggest  that  hormone-bound  LBD  dimers  can  serve  as  a  scaffold  to  position  the 
DBD  dimer,  resulting  in  the  stabilization  of  the  DBD-DNA  complex  and  an  increase  in  DNA 
binding.48  This  notion  is  further  supported  by  the  observed  increase  of  DNA  binding  affinity 
when  two  DBD  monomers  are  connected  by  an  antibody  or  a  linker.48  Thus,  it  is  likely  that  the 
presence  of  LBDs  stabilizes  receptor  binding  to  DNA,  similar  to  the  antibody  or  the  linker,  in 
which  the  LBD  scaffold  is  essential  for  the  correct  positioning  of  the  DBD  dimer. 

Refinement  by  all-atom  simulations 

To  examine  the  stability  and  detailed  features  of  ERa  structures,  all-atom  MD  simulations  were 
performed  starting  from  the  CG-predicted  states  (see  details  in  Methods).  Figure  5(A)  shows 
that  all  five  states  maintain  a  RMSD  value  of  3  A  (relative  to  their  corresponding  starting  config¬ 
urations),  suggesting  that  each  state  is  significantly  stable  within  its  15-ns  simulation  window. 
This  is  further  illustrated  by  the  superposition  of  the  CG  and  all-atom  models  [Fig.  5(B)].  While 
large  displacement  (up  to  3  A)  can  occur  at  the  loops  or  domain-connecting  hinge  loops,  little 
deviation  is  observed  at  the  domain  interfaces  or  the  core  regions  of  each  domain.  Thus,  these 
all-atom  simulations  supports  the  overall  assessment  of  predicted  conformational  states  by  CG 
simulations. 

Close  proximity  of  lysine  residues  at  the  DBD-LBD  interfaces  are  observed  via  detailed 
analyses  of  all-atom  simulation  data.  Figure  5(B)  shows  that  several  lysine  residues  are  within 
a  distance  of  10  A  (e.g.,  K244-K520  at  state  /,  K244-K362  at  state  //,  K231-K303  or  K231-K362  at 
state  Hi,  and  K213-K492  at  state  iv).  This  observation  indicates  that  these  lysine  residues  could 
serve  as  candidate  sites  of  chemical  cross-linking  for  experimental  validation.  It  should  be 
noted  that  the  close  proximity  of  lysine  residues  is  observed  under  a  rather  low  salt  condition 
[Fig.  S  10(A)].  At  higher  salt  concentrations  (e.g.  under  a  physiological  condition),  negative  ions 
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could  further  aid  favorable  interactions  between  positively-charged  lysine  residues.  In  addition, 
alternative  residues,  such  as  tyrosine  and  cysteine,  could  be  used  in  cross-linking  experiments 
based  on  their  pair-wise  distances  (shown  in  Fig.  S10). 

Conclusion 

A  generalized  PPR-CGMD  simulation  pipeline  has  been  applied  to  the  investigation  of  internal 
interactions  between  the  ligand-  and  DNA-binding  domains  of  ERa.  Projection  of  simulation 
data  onto  a  three-dimensional  energy  globe  reveals  multiple  LBD-DBD  orientations  that  are 
energetically  stable,  instead  of  a  singular  stable  conformation.  This  conformational  multiplicity 
is  mainly  due  to  the  long  LBD-DBD  hinge,  whose  intrinsic  flexibility  allows  each  domain  to 
move  freely  in  three  dimensions.  Given  the  increased  demands  for  structural  knowledge  about 
multi-domain  proteins49, 50, 51 ,  it  appears  that  PPR-CGMD  simulations  can  be  broadly  applicable 
to  the  exploration  of  domain  interactions,  e.g.,  within  a  multi-domain  complex. 

Structural  refinement  is  achieved  by  employing  atomic-level  simulations  starting  from  CG- 
predicted  structural  models.  This  multi-scale  approach  has  provided  a  stability  check  for  these 
CG  models.  It  also  reveals  structural  features,  especially  at  the  domain  interfaces,  which  could 
be  used  for  validation  designs  using  SAXS  and/or  cross-linking  measurements. 

There  are  two  major  findings  revealed  from  these  simulations.  First,  among  identified  ERa 
conformations,  the  activation  helix  H12  is  critical  in  mediating  the  cross-talk  between  LBD  and 
DBD,  where  four  out  of  five  stable  states  utilize  H12  as  part  of  the  domain  interface.  This  result 
suggests  that  HI 2  is  not  only  responsive  upon  ligand  binding,  as  observed  in  crystallographic 
studies  of  LBD  alone17, 19, 20,  but  also  critical  for  receptor-DNA  binding.  Second,  the  stabiliza¬ 
tion  of  LBD-DBD  interaction  via  multiple  LBD  surfaces  also  suggests  a  new  role  of  LBD  as  a 
scaffold  for  the  stabilization  of  the  DBD-DNA  complex,  as  observed  in  in  vitro  studies  where 
an  increase  of  DNA  binding  is  observed  due  to  the  presence  of  LBD.  Together  with  previous 
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observations,  our  findings  suggest  a  molecular  model  for  this  critical  hormone  signaling,  where 
the  LBD,  when  bound  to  estradiol,  stabilizes  the  DBD  dimer  for  tight  DNA  binding  via  LBD 
surfaces  including  HI 2. 

Finally,  it  is  worth  noting  that  the  focus  here  was  to  identify  receptor  conformations  that 
are  energetically  stable.  This  focus  was  inspired  by  recent  observations  that  formation  of 
close  LBD-DBD  interface  in  other  NHR  family  members22, 24,  as  a  result  of  the  formation  of 
compact  NHR  conformations.  However,  it  should  also  be  noted  that  an  elongated/extended 
receptor  conformation  has  been  observed  for  NHR  heterodimers  in  solution  or  at  cryogenic 
temperatures.52, 53  The  possibility  of  a  similar  extended  conformation  for  ERa  will  be  explored 
in  future  studies. 
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Figure  Legends 


Figure  1 :  Schematic  diagram  of  ERa  functional  domains  of  a  ligand-binding  domain  (LBD) 
and  a  DNA-binding  domain  (DBD).  LBD  and  DBD  dimers  are  displayed  in  green  and  blue, 
respectively,  and  their  connecting  hinge  loops  are  marked  by  gray  dots.  The  DBD-specific 
DNA  duplex  is  shown  as  ribbon  in  white.  The  activation  helix  HI 2  is  highlighted  in  red.  Part 
of  this  figure  was  rendered  using  UCSF  Chimera.54 


Figure  2:  Sampling  coverage  using  pull-push-release  (PPR)  in  CGMD  simulations.  (A)  Illus¬ 
tration  of  the  three-dimensional  globe  specifying  LBD-DBD  orientations,  where  LBDs  are  in 
the  center  (in  green)  and  DBDs  are  on  the  surface  (in  blue).  (B)  Sampling  coverage  without  and 
with  using  PPR.  The  globe  is  projected  with  a  single  simulation  trajectory  without  (left)  and 
with  (right)  the  assistance  of  PPR.  In  both  simulations,  a  same  initial  configuration  and  a  sim¬ 
ulation  length  of  125  ns  were  used.  (C)  Total  sampling  coverage  from  all  trajectories  projected 
on  the  globe  where  each  simulation  snapshot  is  represented  by  a  blue  dot.  Shown  is  a  front  view 
and  alternative  views  are  also  available  in  Figure  SI.  The  projected  snapshots  were  taken  only 
from  free  and  released  portions  of  PPR  simulation  cycles  with  a  energy  cutoff  of  Elbd-dbd  < 
0  kcal/mol  and  an  ensemble  of  10  structures  from  each  cycle. 


Figure  3:  Energy  landscape  of  ERa  multi-domain  interactions.  (A)  A  plot  of  LBD-DBD  inter¬ 
acting  energy  Elbd-dbd  vs  ERa-DNA  radius  of  gyration  Rg.  A  total  of  4890  snapshots  were 
used  for  the  plot.  (B)  An  energy  globe  for  the  projection  of  simulation  data  and  the  identifi¬ 
cation  of  energetically  stable  ERa  conformations.  The  globe  is  colored  based  on  the  energies 
of  /:'i  B[)  dbi).  as  indicated  by  the  color  bar  on  the  right.  From  this  energy  plot,  there  are  five 
conformational  states  i,  ii,  iii,  iv,  v  that  are  identified  as  energetically  stable  and  ranked  in  the 
order  of  their  Elbd-dbd  values.  Their  Elbd-dbd  and  Rg values  are  -51.6  ±  1.3  kcal/mol  and 
30.40  ±  0.17  A  for  state  i,  -44.2  ±1.6  kcal/mol  and  29.63  ±  0.10  A  for  state  ii,  -44.4  ±1.1 
kcal/mol  and  28.60  ±  0.08  A  for  state  iii,  -39.8  ±1.9  kcal/mol  and  31.58  ±  0.07  A  for  state  iv, 
and  -37.2  ±1.3  kcal/mol  and  31.72  ±  0.33  A  for  state  v.  A  total  of  9780  snapshots  were  used 
for  the  projection,  which  is  doubled  due  to  the  homodimeric  symmetry.  (C)  Surface  view  of  a 
representative  structure  in  each  ERa  conformational  state.  For  clear  visualization,  their  hinge 
loops  are  not  shown  in  these  surface  views.  An  ensemble  of  structures  for  each  state  are  also 
shown  in  Figure  S4. 
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Figure  4:  ERa  LBD-DBD  interfaces.  (A)  Two  LBD  segments:  helices  H3,  H4,  and  H5  (residue 
350—385)  and  helix  H12  (residue  536—544),  highlighted  in  blue  and  magenta,  respectively.  (B) 
The  interactions  of  helix  H12  with  LBDs  and  DBDs.  The  calculations  of  H12-LBD  contacts  was 
based  on  the  estradiol-bound  LBD  crystal  structure  (PDB  entry  1QKU)20,  and  the  calculations 
of  H12-DBD  contacts  was  based  on  the  criterion  that  the  instantaneous  distance  is  within  a 
factor  of  1.2  of  the  corresponding  distance  for  each  pair  of  amino  acids  as  defined  in  Eq.  (2). 
An  ensemble  of  10  highest  ranked  configurations  based  on  E'lbd-dbd  were  used  to  calculate 
the  standard  deviation  (marked  by  error  bars)  for  each  number  of  contacts.  (C)  The  interactions 
of  helices  H3,  H4,  and  H5  with  LBDs  and  DBDs.  (D)  DBD  surface  involved  in  LBD-DBD 
interactions.  Colored  are  the  DBD  surface  residues  according  to  the  number  of  contacts  for 
each  residue  (marked  by  the  color  bar  below).  Results  for  the  LBD  surfaces  are  also  shown  in 
Figure  S5(A)  and  S6(A). 


Figure  5:  All-atom  simulations  of  five  CG-predicted  ERa  states.  (A)  Simulation  trajectories 
of  five  conformational  states  i-v.  RMSDs  were  calculated  based  on  Ca  atoms  of  the  DBD- 
LBD  complex  (excluding  hinge  loop  and  DNA)  with  reference  to  the  starting  structure  [shown 
in  Fig.  3(C)].  (B)  Comparison  of  all-atom  and  CG  models.  In  each  panel,  a  CG  model  is 
superimposed  with  the  Ca  traces  of  its  all-atom  model  taken  from  the  last  frame  of  the  15 -ns 
simulation  trajectory  (left)  and  the  displacement  between  two  models  is  also  colored  based  on 
their  Ca  distances  (right).  In  addition,  lysine  residues  as  candidate  sites  for  chemical  cross- 
linking  at  the  domain  interfaces  are  highlighted  in  magenta. 
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Supporting  Figure  Legends 


Figure  SI:  Sampling  coverage  achieved  by  PPR-CGMD  simulations.  (A)  A  total  of  145  initial 
configurations  used  in  the  simulations,  each  represented  by  a  blue  dot  on  the  globe.  Sampling 
coverage  projected  into  (B)  a  globe  placing  LBDs  in  the  center  and  DBDs  on  the  surface  and 
(C)  a  globe  placing  DBDs  in  the  center  and  LBDs  on  the  surface. 


Figure  S2:  Illustration  of  simulation  data  used  for  clustering  analysis.  (A)  A  plot  of  RMSD  vs 
simulation  time  for  the  release  unbiased  portions  of  all  the  PPR  cycles.  The  last  frame  in  each 
cycle  was  used  as  a  reference  for  RMSD  calculations.  (B)  A  histogram  plot  of  resulting  RMSDs 
from  the  last  5 -ns  segments  (also  highlighted  by  a  dashed  box  in  (A)). 


Figure  S3:  (A)  Clustering  analysis  using  different  numbers  of  clusters  ( N  —  300,500, 1000). 
(B)  Projection  of  the  lowest-energy  configurations  in  the  top  25%  (left)  and  top  30%  of  all 
simulation  data. 


Figure  S4:  An  ensemble  of  structures  representing  each  ERa  state.  A  set  of  10  configurations 
was  selected  from  each  PPR  cycle  with  the  lowest  E'lbd-dbd  values  within  each  state.  LBDs 
are  colored  in  green,  DBDs  are  in  blue  and  the  DNA  duplex  is  in  white. 


Figure  S5:  LBD  surface  involved  in  LBD-DBD  interactions.  Colored  are  the  LBD  surface 
residues  involved  in  the  interactions  (A)  with  its  DBD  partners  at  state  iv  and  (B)  with  a  peptide 
antagonist  as  revealed  from  a  crystal  structure  (PDB  entry  2BJ4).45  Each  residue  on  the  LBD 
surface  is  colored  according  to  the  color  bar,  based  on  the  number  of  contacts  it  can  make 
with  its  DBD  or  peptide  partners,  where  a  contact  was  considered  formed  if  the  instantaneous 
distance  is  within  a  factor  of  1.2  of  the  corresponding  distance  for  each  pair  of  amino  acids  as 
defined  in  Eq.  (2). 


Figure  S6:  Comparison  of  LBD  surfaces  involved  in  LBD-DBD  interactions.  Colored  are  the 
LBD  surface  residues  involved  in  the  interactions  (A)  with  its  DBD  partners  at  states  i,  n,  m,  v 
and  (B)  with  the  GRIP-1  peptide  from  a  co-activator  as  observed  in  a  crystal  structure  (PDB  ID 
entry  3ERD).19  Each  residue  on  the  LBD  surface  is  colored  based  on  the  number  of  contacts 
it  can  make  with  its  DBD  or  peptide  partners.  A  similar  contact  definition  was  used  as  in 
Figure  S5. 
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Figure  S7:  Decomposition  of  the  interaction  energy  into  contribution  from  electrostatic  (white 
bars)  and  hydrophobic  (black  bars)  interactions  for  states  i-v.  The  height  of  the  bar  is  the  average 
calculated  from  the  ensemble  of  each  state,  and  the  error  bar  is  the  standard  deviation. 


Figure  S8:  Mutations  in  the  helix  HI 2  destabilize  ERa  domain  interactions.  (A)  Shown  are  sev¬ 
eral  charged  and  hydrophobic  residues  in  the  helix  H12.  Mutations  of  a  charged  residue  E546A 
(B)  and  hydrophobic  residues  (C)  result  in  the  increase  of  domain  interacting  energies.  The 
globe  was  colored  based  on  the  energy  increase  upon  mutation  and  the  energies  were  evaluated 
after  1,000  steps  of  energy  minimization  for  each  structure  with  mutant  sequences. 


Figure  S9:  Comparison  of  state  iv  (left)  with  the  crystal  structure  of  HNF4a  (right;  PDB  entry 
4IQR)47. 


Figure  S10:  Pair-wise  distance  distributions  g(r)  of  cross-linker  residues  between  DBDs  and 
FBDs  in  all  five  states.  Shown  are  the  distributions  for  lysine  residues  between  DBDs  and  FBDs 
(A),  cysteine  residues  in  the  DBD  and  cysteine  in  the  FBD  (B),  lysine  residues  in  the  DBD  and 
tyrosine  residues  in  the  FBD  (C)  and  tyrosine  residues  in  the  DBD  and  lysine  residues  in  the 
FBD  (D).  The  histogram  g(r)  was  calculated  based  on  a  total  of  1,500  snapshots  from  the  15-ns 
MD  simulations. 
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Coarse-Grained  Simulations  of  Protein-Protein  Association:  An  Energy 
Landscape  Perspective 
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ABSTRACT  Understanding  protein-protein  association  is  crucial  in  revealing  the  molecular  basis  of  many  biological  pro¬ 
cesses.  Here,  we  describe  a  theoretical  simulation  pipeline  to  study  protein-protein  association  from  an  energy  landscape 
perspective.  First,  a  coarse-grained  model  is  implemented  and  its  applications  are  demonstrated  via  molecular  dynamics  simu¬ 
lations  for  several  protein  complexes.  Second,  an  enhanced  search  method  is  used  to  efficiently  sample  a  broad  range  of  protein 
conformations.  Third,  multiple  conformations  are  identified  and  clustered  from  simulation  data  and  further  projected  on  a  three- 
dimensional  globe  specifying  protein  orientations  and  interacting  energies.  Results  from  several  complexes  indicate  that  the 
crystal-like  conformation  is  favorable  on  the  energy  landscape  even  if  the  landscape  is  relatively  rugged  with  metastable  confor¬ 
mations.  A  closer  examination  on  molecular  forces  shows  that  the  formation  of  associated  protein  complexes  can  be  primarily 
electrostatics-driven,  hydrophobics-driven,  or  a  combination  of  both  in  stabilizing  specific  binding  interfaces.  Taken  together, 
these  results  suggest  that  the  coarse-grained  simulations  and  analyses  provide  an  alternative  toolset  to  study  protein-protein 
association  occurring  in  functional  biomolecular  complexes. 


INTRODUCTION 

How  proteins  interact  and  associate  into  large  functional 
complexes  is  one  of  the  key  aspects  of  many  biological 
processes.  The  use  of  computational  methods  to  understand 
molecular  details  of  such  protein-protein  interactions  has 
provided  a  powerful  alternative  to  experimental  structural 
characterization,  especially  for  those  who  form  transient 
but  critical  metastable  conformational  states  (1,2).  Pictur¬ 
ing  the  landscape  of  protein-protein  association  is  of  impor¬ 
tance  in  uncovering  the  hidden  areas  of  a  high-dimensional 
configurational  space,  as  well  as  identifying  new  targets, 
e.g.,  using  these  metastable  conformers,  for  therapeutic 
designs. 

Two  prevailing  views  of  protein-protein  association  are 
the  mechanisms  of  lock-and-key  and  induced-fit  (3,4).  In 
the  former  scenario,  proteins  are  treated  as  rigid  bodies, 
whereas  protein  flexibility  due  to  the  intrinsic  dynamics  is 
taken  into  account  in  the  latter  (and  its  generalizations). 
The  general  docking  approach,  driven  by  the  lock-and-key 
mechanism,  significantly  simplifies  the  search  in  the  con¬ 
formational  space  occurring  in  protein-protein  association 
(5-7).  The  search  is  typically  based  on  atomistic  repre¬ 
sentations,  but  has  also  been  successfully  simplified  by 
coarse-grained  (CG)  models  (8,9)  that  can  accelerate  energy 
calculations.  However,  the  rigid  body  treatment  cannot 
meaningfully  account  for  the  intrinsic  protein  flexibility. 
Although  this  problem  can  be  alleviated  to  some  extent  by 
an  after  search  relaxation,  flexibility  is  inherently  required 
for  biomolecules  to  function,  as  recognized  by  the  induced- 
fit  mechanism.  Such  flexibility  can  be  achieved  computa¬ 
tionally  using  a  wide  range  of  methods  including  molecular 
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dynamics  (MD)  simulations  (10).  Although  the  restriction  to 
a  short  timescale  is  a  bottleneck  for  brute-force  atomistic 
simulations,  MD  simulations  combined  with  efficient  search 
methods  have  provided  fruitful  insights  into  protein  folding 
and  dynamics  (10-16). 

Here,  we  explore  this  general  concept  of  the  induced-fit 
mechanism  by  employing  a  combination  of  MD  simulations 
and  a  simplified  CG  protein  model  with  an  emphasis  on  the 
energy  landscape  aspects  of  protein-protein  association.  This 
method  adopts  a  widely  used  structure-based  approach  to 
model  individual  protein  components  (17-22).  It  also  incor¬ 
porates  the  recent  implementation  introduced  by  Kim  and 
Hummer  (8)  accounting  for  nonnative  protein-protein  inter¬ 
actions,  which  would  otherwise  be  lacking  without  knowing 
the  structure  of  the  entire  complex.  Furthermore,  to  avoid 
trapping  due  to  local  stable  complex-forming  conformations, 
an  efficient  search  method  is  introduced  with  a  focus  on 
facilitating  protein  dissociation  and  reassociation. 

The  outline  of  this  work  is  as  follows.  First,  details  of  the 
CGMD  method,  with  a  straightforward  push-pull-release 
(PPR)  sampling  strategy,  are  described  and  tested  on  several 
model  systems.  To  organize  the  large  amount  of  simulation 
data,  a  structure  clustering  scheme  is  applied.  The  resulting 
conformations  are  then  projected  on  a  three-dimensional 
energy  globe  for  visualizing  the  energetics  of  relevant  stable 
conformations.  Finally,  molecular  forces  stabilizing  each 
identified  conformation  are  briefly  described. 

MODELS  AND  METHODS 
Details  of  the  CG  model 

We  used  a  CG  model  where  each  amino  acid  is  represented  by  a  single 
bead  positioned  at  its  Ca  atom.  The  CG  energy  function  for  two  inter¬ 
acting  proteins/domains  (marked  as  1  and  2)  is  formulated  as 
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follows:  E  =  Ei  +  E2  +E 12.  E i  and  E2  are  the  energy  functions  for  each 
protein,  similar  to  the  structure-based  Go-type  potential  (23-28),  whereas 
El2  is  for  the  interactions  between  proteins.  In  a  nutshell,  this  CG  model 
can  be  viewed  as  two  Go-like  proteins  interacting  with  each  other  in  a  non- 
Go-like  fashion. 

Following  Yang  et  al.  (29),  Ex  and  E2  were  modeled  on  the  basis  of  their 
corresponding  crystal  structures.  These  energy  functions  include  the  inter¬ 
actions  for  bond  (Lbond),  angle  (Wangle),  dihedral  (Ldih),  and  native-like 
contacts  modeled  by  Lennard- Jones  (LJ)-type  (£Lj)  potentials.  Specifi¬ 
cally,  E\  =  £^bond  +  Wangle  T“  £-dih  +  Efj,  where  f^bond  =  X^bonds^^T  ^ o)  ’ 

Wangle  =  Wangles ~  ^o)  >  ^dih  =  Edihedrals^  ^  +  cos(+(0  —  0o))]-  r’ 

6,  and  cp  are  the  instantaneous  bond  distances,  angles,  and  dihedral  angles, 
respectively;  r0,  0O,  and  <pQ  are  the  corresponding  values  in  the  reference 
structure.  We  note  that  the  concept  of  dihedral  angle  among  four  resi¬ 
dues  was  used  mainly  for  the  convenience  of  modeling.  Force  constants 
kb  =  100  kcal/(mol-  A2),  kg  =  20  kcal/(mol-rad2),  k^  =  1.0  kcal/ 

2  (3)  2  ^ 

(mol -rad  ),  and  k^  ’  =  0.5  kcal/(mo-  rad  )  were  used.  The  LJ-type  interac¬ 
tions  for  native  contacts  were  used  between  resides  i  and  j  (>/  +  4), 
Eu  =  Y,i.fo  [5«}-Ay)12  -  6 {v-j/rij)10},  where  e0  =  1  kcal/mol,  rVj  is  the 
residue-residue  distance,  and  o°j  is  the  corresponding  distance  in  the  refer¬ 
ence  structure.  The  definition  of  a  native  contact-forming  pair  was  based  on 
atomically  detailed  calculations  using  the  CSU  software  (30). 

The  energy  function  £12  is  designed  for  nonnative  interactions  between 
two  proteins,  which  were  extended  from  the  Kim-Hummer  model  (8).  It 
includes  the  electrostatic  (£eiec)  and  hydrophobic  (EH)  components, 

E\2  =  Ee  tec  +  Eh.  (1) 

We  used  Ee\ec  —  '^ijqiqj  j  (AireoD^rij)  where  qt  is  the  charge  of  residue  i 
and  e0  is  the  vacuum  electric  permittivity.  An  effective  dielectric  coefficient 
£>eff  =  Ds  exp(r ij/%)  is  applied  to  reflect  the  shielding  effect  between  two 
residues  separated  by  a  distance  of  r,y,  where  Ds  =  10  was  used  to  describe 
the  local  dielectric  environment  when  two  proteins  are  forming  an  interface, 
and  (f  =  8.2  A  to  mimic  the  screening  effect  at  -150  mM  salt  concentration. 
At  pH  7,  residue  charges  qi  =  +  e  for  Lys  and  Arg,  —  e  for  Asp  and  Glu, 
and  +0.5  e  for  His  ( e  is  the  elementary  charge)  were  used  (8).  Hydrophobic 
interactions  (£H)  are  either  attractive  (LJ-type)  {eq  <  0)  or  purely  repulsive 
(. £ij  >  0)  where 


eij  =  a(e™+p).  (2) 

^J(c0)  is  the  Miyazawa-Jernigan  (MJ)  statistical  energy  between  resi¬ 
dues  i,  j  (31).  0  (in  unit  of  k^T)  is  used  as  an  offset  parameter  to 
balance  attractive  and  repulsive  interactions,  and  a  to  scale  Eu  relative 
to  Ee iec  (8).  We  used  Eu(ij')  =  |eiy|[5(<7Iy/rIy)12-  6((jy/>+)10]  if  £//  <  0, 
and  Eu(iJ)  =  eij[5(aij/rij)12(l  -  exp (-(r,y  -  +y)/  d)2)]  if  e,y> 0,  where 
d  =  3.8  A.  A  scaling  factor  of  y  is  introduced  for  oq  as  follows, 

(Tij  =  y{n  +  rj),  (3) 

where  r,-  is  the  van  der  Waals  radius  of  residue  i  as  used  in  (8).  Note  that  in 
this  parameter  set,  both  and  <7,y  can  vary  between  different  pairs  of  resi¬ 
dues  reflecting  the  nature  of  sequence  dependency.  Finally,  E\2  is  accounted 
only  for  surface  residues  with  solvent  accessible  surface  area  >10  A2, 
which  was  calculated  via  atomically  detailed  model  structures  of  individual 
proteins  using  a  probe  size  of  1 .4  A. 


Simulation  and  sampling 

The  CG  model  was  implemented  using  Langevin  MD  simulations  in  a  modi¬ 
fied  version  of  CHARMM  (32).  Simulations  were  performed  at  300  K  with 
a  friction  coefficient  of  50  ps-1  (33).  A  simulation  time  step  of  0.01  ps  was 
used  and  coordinates  were  saved  every  100  ps. 

A  PPR  sampling  strategy,  illustrated  in  Fig.  1,  was  implemented  using 
a  biasing  potential  £PPR  (Eq.  4).  The  PPR  sampling  repeats  a  cycle 
including  the  following  three  parts:  i),  pull  the  two  proteins  away  from 
each  other  when  they  are  close,  ii),  push  them  closer  when  they  are 
separated  by  more  than  a  threshold  distance  Rc,  and  iii),  release  them 
to  interact  freely  by  removing  the  biasing  potential.  We  used  the 
following  £ppr, 

_  /  0,  when  Rt  <RC  and  rmin<r0, 

PPR  \  k(R  —  Rt)2 ,  otherwise 


C  D 


FIGURE  1  PPR  sampling  scheme.  (A)  Three 
parts  of  a  PPR  cycle  where  the  pull  and  push 
portions  are  colored  in  pink  and  the  release  in 
green.  The  target  trajectory  (. Rt )  is  shown  in  solid 
line  (Rmin  <  Rt  <  Rmax)-  (B)  A  three-dimensional 
illustration  of  the  PPR  scheme;  one  protein  (in 
blue )  is  positioned  at  the  origin.  The  inner  sphere 
(in  green),  with  a  radius  of  R  =  Rc,  is  the  region 
where  the  biasing  potential  is  turned  off  (see 
Eq.  4);  the  outer  sphere  has  the  radius  of  R  = 
Rmax •  R  is  the  center-to-center  domain  distance 
between  the  two  proteins.  (C)  Plot  of  R  versus 
time  and  ( D )  RMSD  versus  time  from  a  typical 
PPR  simulation  trajectory  for  the  barnase/barstar 
complex,  where  Rmax  =  50  A,  Rc  —  25  A,  Rmin  — 
0  A  were  used. 


20  40  20  40 

Simulation  time  (ns)  Simulation  time  (ns) 


Biophysical  Journal  103(4)  837-845 


Simulating  Protein-Protein  Interactions 


839 


where  R  is  the  instantaneous  center-to-center  distance,  and  Rt  is  the  target 
center-to-center  distance  (marked  by  solid  lines  in  Fig.  1  A).  rmin  is  the 
closest  residue-residue  distance  between  the  two  proteins.  The  value  of 
r0  =  7.6  A  and  a  spring  constant  of  k  =  100  kcal/(mol-  A2)  were  used.  A 
simulation  length  of  10  ns  was  used  for  each  PPR  cycle. 

To  further  improve  search  efficiency,  a  total  of  100  independent  MD  runs 
were  launched  simultaneously,  each  with  an  initial  configuration  generated 
by  translation-and-rotation  of  the  two  proteins  (see  the  Supporting  Mate¬ 
rial).  The  total  simulation  time  was  10  /xs,  which  resulted  in  105  configura¬ 
tions.  Configurations  from  the  unbiased  release  parts  ( highlighted  in  green 
in  Fig.  1,  A  and  B ),  were  used  for  data  analysis. 


Structure  clustering  analysis 

Configurations  from  the  simulated  trajectories  were  grouped  following 
a  two-step  clustering  procedure.  In  the  first  orientational  clustering  step, 
the  entire  complex  was  aligned  based  only  on  the  crystal  structure  of  one 
protein.  Four  out-of-plane  residues  were  then  picked  from  the  other  protein 
and  their  Cartesian  coordinates  were  used  in  a  standard  K-means  clustering 
algorithm  in  MATLAB  (The  MathWorks,  Natick,  MA).  From  the  resulting 
Nc  clusters  (2000  <NC<  3000),  Nc  representative  configurations  with  the 
lowest  E\2  within  each  cluster  were  chosen  and  used  in  the  next  clustering 
step.  In  the  second  root  mean-square  deviation  (RMSD)  clustering  step, 
these  Nc  configurations  from  the  first  step  were  further  clustered  into  Nf 
final  clusters  using  a  pairwise  RMSD-based  protocol  (34)  with  a  RMSD 
cutoff  of  5  A  (for  the  entire  complex).  Similar  to  the  first  clustering  step, 
the  lowest  Ei2  configuration  within  each  cluster  was  selected  to  represent 
the  Nf  clusters.  To  focus  on  the  identification  of  energetically  stable  confor¬ 
mations,  this  two-step  hierarchical  clustering  was  performed  only  on  those 
configurations  with  E\2  <0. 


RESULTS  AND  DISCUSSION 

Here,  we  first  describe  the  CG  simulations  with  the  selection 
of  model  parameters  and  test  them  on  several  well-charac¬ 
terized  complexes.  To  accelerate  the  simulations,  an  effi¬ 
cient  search  method  is  introduced  and  compared  with 


brute-force  simulations.  This  CG  method  is  finally  applied 
to  characterize  the  energy  landscape  of  several  protein 
complexes:  CCP/cc,  E9/Im9,  E7/Im7,  RXR  ligand-binding 
domain  (LBD)  dimer,  and  barnase/barstar.  Their  resultant 
energy  landscapes  are  further  characterized  and  organized 
according  to  the  forces  that  energetically  stabilize  their 
identified  favorable  conformations. 

The  CG  model 

To  reduce  the  degrees  of  freedom  in  atomistic  simulations 
and  overcome  the  timescale  limitation,  a  CG  approach 
was  used  in  our  studies  of  protein-protein  interactions. 
The  CG  model  was  built  on  the  basis  of  available  crystal 
structures  of  individual  proteins  (i.e.,  Go-like  models;  see 
Models  and  Methods).  The  nonnative-like  interactions 
between  proteins  were  effectively  accounted  for  and  opti¬ 
mized.  Here,  the  first  optimization  is  about  two  CG  param¬ 
eters  (a  and  (3  in  Eq.  2)  used  to  balance  the  competition 
between  hydrophobic  and  electrostatic  interactions.  To 
achieve  this  goal,  brute-force  CGMD  simulations  (without 
a  biasing  potential)  were  carried  out  on  two  protein  com¬ 
plexes  whose  crystal  structures  are  available.  One  is  the 
barnase/barstar  complex  of  bacterial  ribonuclease  and  its 
inhibitor  (PDB  entry  1BRS)  (35),  and  the  other  is  E9/Im9 
(PDB  entry  1EMV),  an  immunity  protein  complex  (36). 
To  examine  their  energetic  stability,  a  range  of  CG  parame¬ 
ters  (a  =  0.2,  0.4,  0.6  and  (3  =  0.8,  1.3,  1.9)  were  used  for 
comparison. 

Fig.  2  shows  the  two-dimensional  histogram  plots  of  the 
center-to-center  distance  between  two  proteins  ( R )  versus 
RMSD  of  the  entire  complex  (with  respect  to  the  crystal 
structure).  A  total  of  nine  sets  of  CG  simulations,  each 


barnase/barstar  E9/lm9 


FIGURE  2  Selection  of  CG  model  parameters. 
(A)  Histogram  plots  of  R  (domain  distance)  versus 
RMSD  for  barnase/barstar  (PDB  entry  1BRS  (35)) 
and  (B)  for  E9/Im9  (PDB  entry  1EMV  (36)).  A 
range  of  a  and  (Eq.  2)  are  used  in  each  simulation 
set.  For  each  set,  10  independent  simulation  runs, 
each  lasting  100  ns,  were  carried  out  starting 
from  their  corresponding  crystal  conformations 
shown  above,  where  R  =  23.2  A  and  27.4  A, 
respectively.  The  parameter  y  =  0.625  (Eq.  3)  is 
used  throughout  this  work  unless  specified. 
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with  a  distinct  set  of  a  and  /?,  were  performed  starting  from 
the  same  crystal  configuration.  Comparison  shows  that  the 
complex  remains  stable  for  the  set  of  a  =  0.4  and 
jS  =  1.3;  any  deviation  tends  to  result  in  a  destabilization 
of  the  crystal  conformation  and  the  complex  starts  to  disso¬ 
ciate.  This  set  of  parameters  were  also  tested  on  three  other 
complexes  shown  in  Fig.  3,  where  each  complex  stays 
within  a  reasonable  RMSD  range  to  its  crystal  conformation 
within  a  simulation  window  of  100  ns.  Additional  energy 
calculations  on  protein-protein  interaction,  averaged  over 
the  100-ns  CGMD  simulations  with  a  =  0.4  and  (3  =  1.3, 
show  that  the  values  of  Ei2  are  -14.7  and  -20.1  kcal/mol 
for  bamase/barstar  and  E9/Im9,  respectively.  These  results 
are  consistent  with  their  measured  binding  enthalpies  of 
—  13.9  and  —19.1  kcal/mol  (37,38)  (Fig.  SI  in  the  Support¬ 
ing  Material).  Taken  together,  these  results  suggest  that  this 
CG  energy  function,  even  though  highly  simplified,  can 
provide  a  rather  detailed  energy  evaluation  on  protein- 
protein  interactions. 

Another  feature  of  the  CGMD  simulations  is  the  introduc¬ 
tion  of  a  scaling  factor  y  (Eq.  3)  to  account  for  realistic  pair¬ 
wise  residue  distances.  This  consideration  is  in  part  based  on 
the  observation  that  a  typical  hydrophobic  pair  of  Leu-Ile 
and  a  typical  charged  pair  of  Asp-Asp  has  an  optimal 
distance  around  7.8  A  and  7  A,  respectively  (39);  these  are 
substantively  lower  than  their  values  of  12.4  A  and  11.2  A, 
based  on  their  van  der  Waals  radii  (8).  To  account  for  such 
a  difference,  the  value  of  y  =  0.625  was  selected  for  rescal¬ 
ing  (Eq.  3).  To  illustrate  and  compare  the  difference,  two  sets 
of  simulations,  one  with  y  =  0.625  and  the  other  with  y  = 
1.0,  were  performed  for  bamase/barstar.  Fig.  S2  shows 
that,  within  the  same  length  of  simulation  time,  the  complex 
drifts  away  from  its  crystal  conformation  during  the  simula¬ 
tions  with  y  =  1.0,  but  is  retained  during  the  ones  with  y  = 
0.625.  This  is  systemically  observed  in  the  simulations  of 
other  complexes  used  in  this  work  (data  not  shown).  Thus, 
this  CG  parameter  y  =  0.625,  together  with  a  =  0.4  and 
(3  =  1.3,  is  used  for  the  rest  of  the  CGMD  simulations. 


Efficient  search  method 

As  demonstrated  previously,  it  is  difficult  for  brute-force 
CGMD  simulations  to  observe  protein  dissociation  events 
once  two  proteins  are  associated.  One  goal  here  is  to  search 
for  multiple  available  conformations,  which  would  require 
a  more  complete  search  in  the  configurational  space.  In 
fact,  several  advanced  sampling  techniques  have  been 
developed  in  the  past  to  address  this  quest  (1 1—14, 16,40— 
45).  In  a  similar  spirit,  a  PPR  sampling  strategy  is  imple¬ 
mented  here  to  accelerate  sampling  different  interactions. 
Specifically,  a  biasing  potential  (see  Eq.  4)  is  first  applied 
to  pull  and  push  the  two  proteins  to  facilitate  protein  disso¬ 
ciation  and  reassociation,  respectively;  this  bias  is  then 
removed  and  the  proteins  are  released  to  interact  freely 
when  they  are  close  enough.  We  repeated  this  PPR  cycle 
to  achieve  sufficient  sampling  (Fig.  1). 

Fig.  1,  C  and  D,  illustrates  a  typical  PPR  trajectory  from 
the  simulations  of  barnase/barstar.  It  shows  that  barnase  and 
barstar  dissociate  and  reassociate  as  seen  in  the  center-to- 
center  distance  and  RMSD  (with  respect  to  the  crystal  struc¬ 
ture)  during  the  push  and  pull  parts  of  the  PPR  cycle  ( pink 
regions  in  Fig.  1  A).  Once  the  two  proteins  are  close  enough, 
the  associated  complex  is  further  relaxed  by  free  MD  simu¬ 
lations  without  any  bias  imposed  during  the  release  portion. 
We  note  that  simulation  data  only  from  these  free  release 
portions  were  used  for  the  rest  of  the  analysis. 

It  is  observed  that  this  PPR  scheme  significantly  increases 
the  search  efficiency  for  different  protein-protein  interacting 
conformations.  Taking  CCP/cc  for  example,  with  the  help 
from  PPR,  a  larger  RMSD  (with  respect  to  the  starting 
structure)  range  (up  to  20  A)  and  a  much  broader  configura¬ 
tion  space  is  sampled  (Fig.  S3);  in  contrast,  the  complex 
remains  in  a  crystal-like  conformation  without  using  PPR 
(Fig.  3  A),  and  is  confined  near  its  staring  point  in  these 
brute-force  simulations  (Fig.  S3).  This  increased  sampling 
efficiency  is  also  systematically  observed  in  other  systems 
used  in  this  work.  To  further  enhance  sampling,  a  set  of 


CCP/cc 


RXR 


E7/lm7 


FIGURE  3  CG  model  parameters  are  tested  on 
three  protein  complexes.  Two-dimensional  histo¬ 
gram  plots  of  R  (domain  distance)  versus  RMSD 
from  the  simulations  with  a  —  0.4  and  (3  —  1.3 
kBT  are  shown.  Crystal  conformations  are  intact 
for  (A)  CCP/cc  (PDB  entry  2PCC  (46)),  (. B )  RXR 
LBD  (PDB  entry  1MZN  (64)),  and  (Q  E7/Im7 
(PDB  entry  7CEI  (52))  complexes.  The  simula¬ 
tions  lasted  100  ns  for  each  protein  complex. 
This  set  of  parameters  a  =  0.4  and  (3  =  1.3  kBT 
is  used  throughout  this  work  unless  specified. 
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100  independent  MD  runs  were  launched  simultaneously 
each  starting  with  a  random  orientation  between  the  two 
proteins  in  the  complex.  As  shown  below,  a  wide  range  of 
conformations  are  sampled  for  the  protein  complexes  we 
examined.  Thus,  an  efficient  search  method  via  the  PPR 
strategy  is  in  place  to  enhance  the  sampling  of  protein- 
protein  association. 

Protein  interacting  landscape  pictured  by  an 
energy  globe 

To  organize  the  large  amount  of  simulation  data,  structurally 
similar  conformations  were  clustered  using  a  two-step  struc¬ 
ture  clustering:  the  first  step  is  based  on  their  relative  orien¬ 
tation  and  the  second  is  based  on  pairwise  RMSD  (see 
Methods).  Typically,  clustering  is  performed  only  using 
pairwise  RMSD  (34);  this  would  be  computationally  ex¬ 
pensive  because  configurations  generated  from  CGMD 
simulations  are  on  the  order  of  105.  This  two-step  procedure 
overcomes  this  hindrance  by  grouping  the  number  of  config¬ 
urations  to  the  order  of  103  clusters  after  the  first  step  and 
finally  to  Nf  clusters  in  the  order  of  102. 

To  assist  the  navigation  of  the  conformational  diversity, 
we  projected  these  Nf  configurations  onto  a  unit  sphere  or 
globe  representing  relative  orientation  between  complex¬ 
forming  proteins.  As  illustrated  in  Fig.  1  B ,  one  can  imagine 
that  one  protein  is  inside  the  globe,  whereas  the  other  pro¬ 
tein  takes  different  orientations  on  the  surface;  the  globe 
was  colored  according  to  the  interacting  energy  E]2  (Eq. 
1).  We  found  that  this  energy-mapped  globe  is  a  useful 
tool  to  identify  energetically  favorable  conformations  with 
different  protein  positioning  on  the  landscape.  It  also  serves 
to  access  the  sampling  quality  achieved  by  PPR-assisted 
CGMD  simulations  by  examining  the  coverage  on  the  globe 
surface.  Discussed  below  are  five  protein  complexes 


(CCP/cc,  E9/Im9,  E7/Im7,  RXR  LBD  dimer,  and  bamase/ 
barstar)  studied  in  this  work;  we  organized  them  according 
to  the  decomposition  of  the  interacting  energy  into  electro¬ 
static  and  hydrophobic  components. 

CCP/cc:  electrostatics-driven  association 

To  demonstrate  the  application  of  CGMD  simulations,  the 
CCP/cc  complex  was  first  examined;  a  similar  procedure 
was  followed  for  other  protein  complexes.  CCP/cc  is  a 
complex  formed  between  cytochrome  c  peroxidase  and 
cytochrome  c,  whose  crystal  structure  is  shown  in  Fig.  3  A 
(46).  Calculated  from  the  PPR-assisted  CGMD  simulation 
data,  Fig.  4  A  shows  a  plot  of  Ei2  versus  RMSD  (with 
respect  to  the  crystal  structure).  It  clearly  shows  that  a 
wide  RMSD  range  is  sampled  for  CCP/cc.  A  close  examina¬ 
tion  also  shows  that  the  crystal-like  configurations  have 
a  lower  2s  12,  indicating  that  the  CG  energy  function  captures 
the  molecular  forces  stabilizing  the  crystal-like  conforma¬ 
tion.  It  is  worth  noting  that  such  a  correlation  between  £12 
and  RMSD,  where  the  low  EX2  conformations  are  funneled 
into  low  RMSD  regions  in  the  context  of  protein-protein 
interactions,  somewhat  resembles  the  funnel-like  shape 
recognized  in  protein  folding  (47-49). 

The  resultant  Nf  =  191  conformations  after  a  two-step 
clustering  were  projected  into  an  energy  globe  shown  in 
Fig.  4  B.  It  is  fairly  easy  to  locate  conformational  states 
that  are  energetically  stable  or  metastable.  Marked  by  arrows 
on  Fig.  4  B  are  four  identified  lowest  EX2  conformations,  each 
of  which  consists  of  an  ensemble  of  five  configurations. 
Among  them,  conformation  (a)  has  a  very  similar  CCP/cc 
binding  interfaces  to  the  crystal  configuration  (within  3  A 
of  RMSD),  suggesting  that  the  crystal  structure  is  favored 
in  the  CG  energy  evaluation.  Other  alternative  conformations 
(b-d)  are  metastable  in  Ei2,  which  differ  in  either  binding 
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FIGURE  4  Energy  landscape  of  the  CCP/cc 
complex.  (A)  A  plot  of  EX2  versus  RMSD.  A  total 
of  44,45 1  configurations  from  CGMD  simulations 
are  shown  in  black  dots,  where  Rmax  =  90  A,  Rc  = 
40  A,  and  Rmin  =  0  A  were  used  in  the  PPR  scheme. 
( B )  Front  view  of  the  energy  globe  colored  by  EX2. 
A  total  of  Nf  =  191  clusters  were  obtained  (also 
shown  in  blue  dots  in  A)  after  the  two-step  clus¬ 
tering.  Four  representative  conformations  ( a-d) 
are  shown  on  the  globe,  where  each  conformation 
is  represented  by  an  ensemble  of  five  lowest  energy 
configurations.  Conformation  (a)  with  the  lowest 
E 12  resembles  the  crystal  structure  (46).  Note  that 
conformations  with  lower  EX2  are  not  observed  in 
the  back  of  this  globe.  (C)  The  decomposition  of 
hydrophobic  ( white  bars )  and  electrostatic  ( gray 
bars)  energies  from  Ei2  for  each  conformation 
(a-d).  Their  averages  (and  standard  deviations) 
were  calculated  from  the  ensemble  of  five  configu¬ 
rations  shown  in  B. 
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interfaces  or  relative  orientations.  For  example,  cytochrome 
c  (in  red )  is  rotated  in  conformation  (c)  away  from  the 
crystal-like  conformation  (a).  Similar  to  what  is  observed 
in  protein  folding,  these  metastable  conformations  may  serve 
as  important  intermediate  states  right  before  CCP/cc  forms 
the  crystal-like  conformation,  either  thermodynamically  or 
kinetically. 

To  evaluate  the  molecular  forces  driving  toward  the  stable 
conformation,  electrostatic  and  hydrophobic  energies  were 
examined  separately  according  to  Eq.  1.  Fig.  4  C  shows 
both  components  in  these  conformations  ( a-d )  where  elec¬ 
trostatics  dominate  over  hydrophobic  contributions  in  the 
total  energy  Exl.  This  is  consistent  with  that  observed  in 
the  structural  analysis  of  the  CCP/cc  complex  (46).  After 
all,  electrostatic  interactions  are  known  to  play  an  important 
role  in  protein-protein  interactions  (50,51).  In  addition,  the 
crystal-like  conformation  (a)  has  stronger  hydrophobic  in¬ 
teractions  compared  to  others  ( b-d ),  thus  suggesting  a 
possible  role  of  hydrophobic  interactions  in  the  specificity 
of  its  crystal-like  binding  interface. 

This  phenomenon  of  electrostatics-driven  protein  associ¬ 
ation  is  also  observed  in  two  other  complexes:  E9/Im9  and 
E7/Im7,  two  immunity  proteins  in  the  form  of  specific  bac¬ 
terial  toxin/inhibitor  complexes  (36,52).  These  two  com¬ 
plexes  have  similar  crystal  structures  (shown  in  Figs.  2  B 
and  3  C)  with  a  sequence  identity  of  50%.  Their  PPR- 
CGMD  simulation  results  exhibit  a  funnel-shaped  plot  of 
Ei2  versus  RMSD  (Fig.  S4  A  and  Fig.  S5  A),  similar  to 
what  is  seen  in  CCP/cc  (Fig.  4  A).  Each  crystal-like  confor¬ 
mation  has  the  lowest  EX2  on  the  energy  globe  for  both 
protein  complexes  (Fig.  S4  B  and  Fig.  S5  B ),  respectively. 
Four  conformations  (a-d)  with  low  EX2  identified  from 
the  energy  globe  have  similar  protein  binding  interfaces 
on  the  immunity  protein  (in  blue),  although  their  internal 
orientations  are  different.  For  their  energy  decomposition 


(Fig.  S4  C  and  Fig.  S5  C),  it  appears  that  E7/Im7  has 
much  higher  electrostatic  energy  than  E9/Im9.  This  differ¬ 
ence  is  mainly  due  to  more  charged  residues  involved  at 
the  E7/Im7  interface.  It  was  also  observed  from  mutation 
studies  that  a  tyrosine  (Tyr-54  of  Im9)  residue  at  the  hydro- 
phobic  core  of  the  binding  interface  is  important  for  the 
stability  of  the  E9/Im9  complex  (36),  which  might  con¬ 
tribute  to  the  less  dominant  hydrophobic  interactions. 

RXR  LBD  dimer:  hydrophobics-driven 
association 

A  different  molecular  driving  force  is  observed  in  the  forma¬ 
tion  of  the  LBD  dimer  of  a  nuclear  receptor  RXR.  Fig.  5  A 
shows  the  plot  of  E12  versus  RMSD  (with  respect  to  the 
crystal  conformation)  for  this  dimer,  which  is  more  rugged 
compared  to  the  CCP/cc  complex  in  Fig.  4  A.  The  plot 
is  similar  to  the  CCP/cc  complex  where  the  crystal-like 
conformation  is  favored,  although  conformation  (d)  has 
comparable  EX2  with  the  crystal-like  conformation  (a) 
(Fig.  5,  A  and  B).  In  addition,  these  two  conformations  (a) 
and  (d)  are  close  on  the  energy  globe,  and  differ  only 
by  7  A  in  RMSD  (Fig.  5  B),  suggesting  that  conformation 
(d)  may  serve  as  an  intermediate  to  the  formation  of  the 
crystal-like  conformation.  It  is  also  clear  that  hydrophobic 
interactions  generally  dominate  the  RXR  dimer  interface, 
especially  in  the  crystal-like  conformation  (a)  (Fig.  5  C). 
Among  all  the  four  conformations,  one  exception  is  confor¬ 
mation  (c)  where  the  second  LBD  (in  red)  rotates  away  and 
binds  at  distant  sites.  A  close  examination  on  LBD  binding 
interfaces  in  the  crystal  structure  also  shows  that  parts  of 
the  dimeric  interface  is  weakly  attractive  or  even  repulsive 
locally  in  electrostatics  (data  not  shown),  thus  suggesting 
that  hydrophobic  interactions  are  the  major  molecular  driv¬ 
ing  forces  in  the  dimerization. 
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FIGURE  5  Energy  landscape  of  the  RXR  LBD 
dimer.  (A)  A  plot  of  Ei2  versus  RMSD.  A  total  of 
58,480  configurations  from  CGMD  simulations 
are  shown  in  black  dots,  where  Rmax  =  100  A, 
Rc  =  45  A,  and  Rmin  =  0  A  were  used  in  the  PPR 
scheme.  ( B )  Front  view  of  the  energy  globe  colored 
by  E\2-  We  used  a  total  of  Nf  =  589  clusters  (also 
shown  in  blue  dots  in  A)  after  the  two-step  clus¬ 
tering.  Four  representative  conformations  ( a-d) 
are  shown  on  the  globe,  where  conformation  (a) 
resembles  the  crystal  structure  (64).  (C)  The 
decomposition  of  hydrophobic  ( white  bars )  and 
electrostatic  {gray  bars)  energies  from  Ei2  for 
each  conformation  (a-d). 
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Barnase/barstar:  interplay  between  hydrophobic 
and  electrostatic  interactions 

The  notion  of  a  stable  crystal-like  conformation  is  chal¬ 
lenged  by  simulation  results  of  the  barnase/barstar  complex. 
Although  the  crystal  conformation  displayed  stability  in 
brute-force  simulations  (Fig.  2  A),  PPR-based  CGMD  simu¬ 
lations  show  that  multiple  alternative  conformations  are 
energetically  stable  (Fig.  6  A).  Fig.  6  B  shows  four  conforma¬ 
tions  with  low  E12  (a-d)  on  a  multibasin  energy  globe.  In 
particular,  conformation  ( b )  has  a  similar  Ei2  with  the 
crystal-like  conformation  ( a ),  but  barstar  (in  red)  binds 
at  a  distant  site  from  the  C-shaped  binding  groove  of  bar- 
nase  (in  blue).  Furthermore,  energy  decomposition  shows 
stronger  hydrophobic  interactions  in  conformations  (a)  and 
(d),  whereas  electrostatics  is  stronger  in  conformations  ( b ) 
and  (c)  (Fig.  6  C).  This  suggests  that  an  interplay  between 
both  electrostatic  and  hydrophobic  interactions  is  in  place 
among  these  conformations,  in  accord  with  experimental 
observations  on  a  high  degree  of  both  shape  and  charge 
complementarity  (35). 

These  simulation  results  may  provide  a  possible  explana¬ 
tion  for  previous  mutation  studies.  Structural  analyses  based 
on  the  crystal  structure  show  that  a  set  of  charged  residues 
(Lys-27,  Arg-87,  and  His- 102)  in  barnase  interact  with 
barstar  (35),  as  in  the  conformation  (a).  In  contrast,  a  dif¬ 
ferent  set  of  charged  residues  (Arg-59,  Glu-60,  Lys-62, 
Lys-66,  and  Arg-69)  are  involved  in  conformation  ( b ), 
contributing  to  an  increased  electrostatic  energy.  One  dif¬ 
ference  is  that  Arg-59  is  at  the  core  of  a  network  of  interac¬ 
tions  at  the  binding  interface  in  conformation  (b),  whereas 
it  is  on  the  edge  or  far  away  from  the  core  binding  inter¬ 
faces  in  the  crystal-like  conformation  (a).  Because  of  such 
a  critical  role  of  Arg-59  in  the  conformation  (b),  one  would 
imagine  that  any  disruption  might  affect  its  complex  asso¬ 
ciation,  either  thermodynamically  or  kinetically.  Indeed,  a 


significant  change,  >400  times  in  dissociation  rate,  has 
been  observed  in  a  point  mutation  of  Arg-59  to  Ala  (53). 
Additional  mutations  Asn-58  and  Glu-60  at  the  interface 
also  show  substantial  change  in  the  rate.  This  suggests 
that  conformation  (b),  predicted  from  CGMD  simulations, 
provides  a  structural  basis  for  the  observed  large  rate 
change  upon  mutation.  We  also  note  that  the  barnase  sur¬ 
face  at  the  conformation  ( b )  is  slightly  deformed,  suggesting 
that  induced-fit  helps  achieve  a  better  charge  comple¬ 
mentarity.  Taken  together,  these  results  suggest  that  an 
energy  landscape  view  of  protein-protein  interactions  makes 
the  identification  of  alternative  conformations  in  barnase/ 
barstar  possible,  further  providing  a  sound  structural  basis 
for  mutagenesis. 

CONCLUDING  REMARKS 

We  have  established  a  theoretical  pipeline  to  navigate  the 
energy  landscape  of  protein-protein  association  via  PPR- 
CGMD  simulations.  The  simulations  naturally  permit  and 
account  for  the  flexibility  of  protein  domains  in  the  realiza¬ 
tion  of  induced- fit  mechanisms.  The  use  of  a  PPR  sampling 
scheme  enables  an  exhaustive  search  to  uncover  hidden 
areas  of  the  conformational  space.  An  energy  globe  is 
further  introduced  to  navigate  the  energy  landscape  of 
a  wide  range  of  resultant  conformations.  This  globe  also 
allows  accessing  the  sampling  quality  determined  by  the 
extent  to  which  the  globe  is  covered  by  the  simulation 
trajectories.  Among  four  (out  of  five)  protein  complexes 
we  examined,  their  crystal-like  conformations  are  favorable 
on  the  energy  landscape,  suggesting  that  the  CG  model 
captures  the  basic  features  of  molecular  forces  driving 
protein-protein  association.  One  exception  is  barnase/bar¬ 
star,  where  apart  from  the  crystal-like  conformation,  alterna¬ 
tive  conformations  are  also  energetically  favored. 


FIGURE  6  Energy  landscape  of  the  barnase/bar¬ 
star  complex.  (A)  A  plot  of  E12  versus  RMSD.  A 
total  of  43,554  configurations  from  CGMD  simula¬ 
tions  are  shown  in  black  dots,  where  Rmax—  50  A, 
Rc  =  25  A,  and  Rmin  =  0  A  were  used  in  the  PPR 
scheme.  ( B )  Front  view  of  the  energy  globe  colored 
by  E\2-  We  used  a  total  of  Nf  =  120  clusters  (also 
shown  in  blue  dots  in  A)  after  the  two-step  clus¬ 
tering.  Four  representative  conformations  (a-d) 
are  shown  on  the  globe,  where  conformation  (a) 
with  the  lowest  E\2  resembles  the  crystal  struc¬ 
ture  (35).  (C)  The  decomposition  of  hydrophobic 
(white  bars)  and  electrostatic  (gray  bars)  energies 
from  E\2  for  each  conformation  (a-d). 
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The  ability  of  a  simple  CG  model  to  identify  relevant 
conformations  could  be  due  to  the  relatively  smooth  bind¬ 
ing  free  energy  landscape  for  functional  proteins  (54-57). 
Of  course,  the  inherent  simplifications  in  such  a  simple 
model  cannot  characterize  atomically  detailed  interactions 
(23,58-61);  in  that  case,  the  CG-identified  conformations 
can  be  relaxed  and  used  as  a  starting  point  for  atomistic 
simulations.  In  addition,  the  current  CG  model  has  captured 
the  physical  basis  of  protein-protein  association,  but  it  may 
fail  to  produce  meaningful  results  on  protein-ligand  interac¬ 
tions  where  details  can  matter.  We  also  note  that  folding  and 
unfolding,  which  can  be  coupled  with  the  protein  associa¬ 
tion  process  (54-56,62),  are  not  studied  here.  Furthermore, 
our  focus  is  mainly  on  protein  association  into  compact 
conformations  that  are  energetically  favorable;  the  forma¬ 
tion  of  extended  and  entropically  favorable  conformations, 
or  the  kinetic  process  of  association  itself,  is  not  fully  exam¬ 
ined  here,  but  will  be  illustrated  in  future  communications. 

Finally,  we  wish  to  emphasize  that  this  PPR-CGMD 
simulation  pipeline  can  be  readily  applied  to  those  protein 
complexes  whose  crystal  structures  are  unknown,  especially 
since  considerable  knowledge  about  individual  protein 
subunits  has  been  made  available  after  decades  of  efforts 
(63).  It  is  thus  anticipated  that  this  pipeline  is  positioned 
to  serve  as  an  alternative  approach  to  study  protein-protein 
interactions  on  a  wide  range  of  protein  complexes. 

SUPPORTING  MATERIAL 

Five  figures,  a  table,  and  reference  (65)  are  available  at  http://www. 
biophysj.org/biophysj/supplemental/S0006-3495(12)00787-4. 
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